CN114359920A

CN114359920A - Image processing method, device, equipment and storage medium

Info

Publication number: CN114359920A
Application number: CN202011065951.9A
Authority: CN
Inventors: 王倩; 林彬彬; 邓佳康
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2022-04-15

Abstract

The application discloses an image processing method, an image processing device, an image processing apparatus and a storage medium, wherein the method comprises the following steps: identifying image content of the N images; when the image content of the N images is identified to contain document data, intercepting the document data images from the N images to obtain M intercepted images; and splicing the M intercepted images, and outputting a spliced file in an electronic document format. According to the scheme provided by the embodiment of the application, the document data image is captured from the image containing the document data, the captured document data image is spliced, and the spliced file is output in the electronic document format, so that the time for sorting documents such as PPT (Power Point) or courseware is effectively reduced, and the efficiency is improved.

Description

Image processing method, device, equipment and storage medium

Technical Field

The present invention relates generally to the field of image technology, and in particular, to an image processing method, apparatus, device, and storage medium.

Background

With the development of the technology, at present, in meetings, training and teaching, modes of applying PPT or courseware and other document data are very popular, the lecture is performed in a PPT or courseware and other document modes, convenience is brought to a lecturer, the low efficiency of writing on a white board or a blackboard in real time during the lecture can be avoided, however, inconvenience is brought to the lecturer, and due to the fact that the time of writing of the lecturer in real time is saved in the PPT or courseware and other document modes, the lecture speed is high, and the lecturer can not take notes.

At present, most of listeners record contents of documents such as PPT (point-to-point) or courseware by video recording or photographing, and after the speech is finished, the documents such as PPT or courseware are sorted, so that the method is low in efficiency.

Disclosure of Invention

In view of the above-mentioned drawbacks or deficiencies in the prior art, it is desirable to provide an image processing method, apparatus, device, and storage medium.

In a first aspect, the present application provides an image processing method, including:

identifying image content of the N images;

when the image content of the N images is identified to contain document data, intercepting the document data images from the N images to obtain M intercepted images;

splicing the M intercepted images, and outputting a spliced file in an electronic document format;

wherein N is a positive integer, and M is a positive integer less than or equal to N.

In one embodiment, the image is a video frame image;

before identifying the image content of the N images, the method further comprises the following steps:

acquiring a mark point of a mark record in a target video;

and determining the video frame image corresponding to the mark point in the target video according to the mark point.

In one embodiment, before acquiring the mark point for marking the record in the target video, the method further includes:

receiving a marker input on a target video in a target video recording process or a target video playing process;

marking a mark point in a corresponding video frame image in the target video in response to the mark input;

wherein, each mark point corresponds to a video frame image.

In one embodiment, stitching the captured images comprises:

acquiring M document data images corresponding to the captured images, and playing time sequences in a target video;

determining a first splicing sequence of the M intercepted images according to the playing time sequence;

and splicing the M intercepted images according to the first splicing sequence.

In one embodiment, stitching the captured images comprises:

determining the document page number of the document data image corresponding to the M intercepted images;

determining a second splicing sequence of the M intercepted images according to the document page number;

and splicing the M intercepted images according to a second splicing sequence.

In one embodiment, the step of capturing the document material image from the N images to obtain M captured images includes:

and taking one of the same document material images as a captured image when the same document material image exists in the document material images captured by the N images.

In one embodiment, when any one boundary of the document material contained in the image content is identified, and an included angle exists between the boundary corresponding to the image where the document material is located, and the included angle is greater than an included angle threshold value,

intercepting the document material image from the N images, comprising:

and performing perspective correction clipping on the document data.

In one embodiment, the electronic document format includes any one of a presentation file format, a PDF format, a rich text format, a word format, and a text editing system document format.

In one embodiment, the documentation includes any one of PPT documents and courseware documents.

In a second aspect, the present application provides an image processing apparatus comprising:

the identification module is used for identifying the image content of the N images;

the intercepting module is used for intercepting the document data image from the N images to obtain M intercepted images when the image content of the N images contains document data;

and the output module is used for splicing the M intercepted images and outputting a spliced file in an electronic document format.

In a third aspect, the present application provides an apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the image processing method as in the first aspect when executing the program.

In a fourth aspect, the present application provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method of the first aspect.

According to the technical scheme provided by the embodiment of the application, the document data image is intercepted from the image containing the document data, the intercepted document data image is spliced, and the spliced file is output in the electronic document format, so that the time for sorting documents such as PPT (power point) or courseware is effectively shortened, and the efficiency is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described are capable of operation in sequences other than those illustrated or otherwise described herein.

Moreover, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

At present, in meetings, training and teaching, modes of applying PPT (power point) or courseware and other document data are very popular, the lecture is carried out in the modes of PPT or courseware and other documents, convenience is brought to a lecturer, the low efficiency of writing on a white board or a blackboard in real time during the lecture can be avoided, and the time of the lecturer for writing in real time is saved in the modes of PPT or courseware and other documents, so that the lecture speed is high, and the lecturer can not take notes in time.

At present, most of listeners record contents of documents such as PPT (point-to-point) or courseware by video recording or photographing, and the PPT or courseware is sorted after the speech is finished, so that the method is low in efficiency.

Based on the above problems, the present application is expected to provide an image processing method, which has high efficiency and high user satisfaction when document materials such as PPT or courseware recorded in a video or photographing manner are sorted.

The method can be applied to terminal equipment provided with a camera, wherein the terminal equipment can be a mobile phone, a tablet personal computer, a notebook computer, an intelligent helmet, intelligent glasses, a telephone watch and the like.

In the image processing method provided by the embodiment of the present invention, the execution main body may be an image processing apparatus, and the image processing apparatus may be implemented as part or all of a terminal device by software, hardware, or a combination of software and hardware. In the following method embodiments, the execution subject is a terminal device as an example.

Referring to fig. 1, a flowchart illustrating an image processing method according to an embodiment of the present application is shown.

As shown in fig. 1, an image processing method may include:

s110, identifying the image content of the N images.

Specifically, the image may be a video frame image (i.e., a frame image corresponding to a certain frame in a video), or may be a picture image (e.g., a photo taken by a camera, a screenshot image, etc.). The image can be obtained directly from a terminal device recording the video or the picture, can also be obtained from a storage device storing the recorded video or the picture, can also be obtained by downloading, and the like, and the form and the obtaining mode of the image are not limited.

Identifying image content of an image may be accomplished by training a neural network. Identification may also be made in other ways.

And if the image is a picture image, directly inputting the acquired picture image into the neural network model to finish the image content identification of the image.

If the image is a video frame image, the acquired video needs to be processed to obtain the video frame image.

In one embodiment, the image is a video frame image, and before identifying the image content image of the N images, the method further comprises:

acquiring a mark point of a mark record in a target video;

Specifically, the target video is a video recorded by a user, or a stored video or a video obtained by downloading and the like, which has a mark point recorded by a mark. The number of the mark points recorded by the mark may be N, where N is a positive integer. The video frame image may be any one of the images, and in this embodiment, the video frame image is a video frame image corresponding to a marker point in the target video, and N marker points are marked, that is, N corresponding video frame images are marked.

wherein, each mark point corresponds to a video frame image.

Specifically, when recording or playing a video, a user can record the video or play the video while marking a mark point on the video according to actual needs. When marking points, automatic marking can be carried out at preset time intervals, and the preset time can be set according to actual needs. It can be understood that if the preset time duration is set to be too large, namely, the marking points are marked once at longer intervals, the images containing the document materials in the image content may be missed; if the preset time length is set to be too small, namely the marking points are marked once at short time intervals, the image content containing the same document material may repeat for many times, and more images need to be identified during image identification, which takes longer time. The preset duration can be set according to neural network model learning training.

When marking the marking points, the marking points can be manually marked when a user turns pages according to PPT (Power Point) or courseware and the like of a lecture while recording a video or playing the video.

When marking the mark points, the algorithm for judging whether the document data contained in the image content of the images of the adjacent frames are changed can be adopted for real-time judgment, if the document data contained in the image content of the images of the adjacent frames are judged to be changed, the mark points can be automatically marked, a popup window can be used for inquiring whether the user needs to mark, and the user can select whether the mark points are marked according to the actual requirement. It should be noted that the way of marking the point on the video may also adopt other ways, which is not limited herein.

After the target video and the mark points marked and recorded in the target video are obtained, the marked video frame image can be determined according to the mark points in the target video. When the image content of the image is identified, all marked video frame images can be input into the neural network model, and then whether the image content of the video frame images contains document materials or not can be determined.

S120, when the image content of the N images contains the document materials, intercepting the document material images from the N images to obtain M intercepted images.

Specifically, when it is recognized that the image content of the image includes document data, the image may be too dark or overexposed due to the recorded environmental factors, and at this time, the image that is too dark or overexposed needs to be processed to the normal brightness range first, and the processing may adopt the prior art, and is not described herein again. Optionally, the documentation may include any one of a PPT document, a courseware document, and the like.

And detecting the boundary of document data in the processed image according to a boundary identification technology, and cutting the image according to the detected boundary. It can be understood that, in order to make the cut document material image beautiful, when cutting the image according to the detected boundary, the peripheral boundary can be extended outwards (i.e. the left boundary is extended leftwards, the right boundary is extended rightwards, the upper boundary is extended upwards, and the lower boundary is extended downwards) to preset lengths, and the preset lengths extended in the four directions can be equal or unequal, and can be set according to actual requirements.

Specifically, because the document images captured from the N images may have the same document image, it may be determined whether the document included in the image content of the captured image has the same document, and if so, one of the captured images corresponding to the same document is retained, and the captured images corresponding to the other of the captured images in the same document are all removed.

When judging whether the document data contained in the image content of the intercepted image has the same document data, the comparison algorithm of the texts in the image can be adopted to compare the document data contained in all the intercepted images.

In this way, since the same document material image exists in the captured document material image, the number of captured images may be smaller than or equal to the number of images, that is, the number M of captured images is a positive integer smaller than or equal to the number N of images.

When a video is recorded, the video is generally not recorded directly on the screen, that is, the document material contained in the recorded video is usually inclined (the inclination refers to any one boundary of the document material, and the boundary corresponding to the image where the document material is located has an included angle, and the included angle is greater than the included angle threshold). Therefore, when intercepting the document material, it needs to be processed by using the tilt detection and correction method. The document data is first detected for tilt, and if the document data is tilted, the document data needs to be corrected first. The tilt detection methods generally used are: a text line-based detection method, a projection contour analysis method, a Hough transformation method and the like.

In one embodiment, when any one boundary of document data included in image content is identified, and an included angle exists between the boundary corresponding to the image where the document data is located, and the included angle is greater than an included angle threshold, intercepting the document data in the image, including: and performing perspective correction clipping on the document data.

Specifically, perspective correction clipping is performed on the document data, that is, included angles between all boundaries of the document data and boundaries corresponding to the image where the document data is located are corrected to be within a threshold value of the included angle, and a Photoshop technology or other technologies such as a distorted document image restoration technology may be used, which is not limited herein.

The included angle threshold may be set according to actual requirements, and for example, the included angle threshold may be set to 5 °.

And S130, splicing the intercepted images, and outputting a spliced file in an electronic document format.

Specifically, the process of capturing the image is a document data image captured from the image, and the process of splicing the captured image may include splicing the captured images to obtain a spliced image, outputting the spliced image in an electronic document format to a spliced file, or inputting the captured image into a word document, a PPT document, a PDF document, or the like, splicing the captured images in any one of the documents, or taking each captured image in any one of the documents as a page in the document, and then outputting the captured images in the electronic document format as the spliced file. After the spliced file is output in the electronic document format, a file storage path can be sent to a user, and the stored file can be searched in the file manager.

The electronic Document Format may be set according to an actual requirement of a user, and optionally, the electronic Document Format may include any one of a presentation file Format, a PDF (Portable Document Format) Format, a Rich Text Format (RTF) Format, a Word Format, and a Word Processing System Document (WPS) Format. And formats capable of displaying images such as a message Excel workbook format, a webpage format, an MHT file format and the like can also be used.

It can be understood that, when a lecturer performs a lecture, there often exists a jump back to a document such as a PPT or a courseware that has been previously lectured, and in this case, a video frame image corresponding to a mark point marked by a person who records a video or a photo taken by a person who takes a photo may contain the same content as a video frame image or a photo taken corresponding to a previous mark point. If all the captured images are directly spliced, the spliced page number possibly appearing in the obtained spliced image does not correspond to the page number of the original PPT or courseware and other documents, and the obtained spliced image contains repeated contents. Therefore, when stitching, the cut images need to be sorted.

In one embodiment, stitching the captured images comprises:

and splicing the M intercepted images according to the first splicing sequence.

Specifically, the playing time sequence of the document data image in the target video is related to the time of marking the mark points in the target video, the playing time sequence corresponding to the mark point marked first is ahead, and the playing time sequence corresponding to the mark point marked later is behind, that is, the playing time sequence of the document data image in the target image is the time sequence of the mark point.

The first splicing sequence is the display sequence of the intercepted images in the output spliced file, and the first splicing sequence is consistent with the playing time sequence and is the time sequence when the mark points are marked. And splicing the M intercepted images according to the first splicing sequence.

In one embodiment, stitching the captured images comprises:

and splicing the M intercepted images according to the second splicing sequence.

Specifically, in a general case, in document data such as PPT or courseware, the position of a page may be set at left, middle, right, or the like of the top end of the page or the bottom end of the page, the position at which the page may be set in the captured image is detected, the page of the image of the document data is determined, and the page of the document of M captured images is determined according to the page of the image of the document data.

The second splicing sequence is the display sequence of the intercepted images in the output splicing file, and the second splicing sequence is consistent with the page sequence of document data such as PPT or courseware and the like. And splicing the M intercepted images according to the second splicing sequence.

In the embodiment of the application, when the image content of the N images contains the document data, the document data images are intercepted from the N images to obtain the M intercepted images, the M intercepted images are spliced, and the spliced file is output in the electronic document format, so that the time for a user to sort documents such as PPT (power point) documents or courseware documents can be shortened, and the efficiency is improved.

The following describes an image processing method proposed in the embodiment of the present application, taking recorded tag video as an example.

A user records tag video (manually marks tag while recording) by a camera, after the recording is finished, a tag video album is opened on a mobile phone, a viewing mark inlet can be displayed, the viewing mark inlet is clicked, a viewing mark is expanded to the time number of the video of a video frame image corresponding to each mark point, the video frame image corresponding to each mark point is identified to determine whether the video frame image contains file data, when the video frame image contains the file data, a file export button can be displayed on a mobile phone album interface, the export file is clicked, the file in the video can be intercepted and spliced, the page needing to be corrected is subjected to perspective correction and cutting, the sequence of the file data can be determined based on the information of the page, the time and the like of the video frame image in the splicing process, whether the same file data exists or not is judged, and if the same file data exists, the duplicate removal processing is carried out, the document data in the video is exported and stored in a PDF format file, and the storage path of the file can be prompted to a user, so that the file can be found in the file manager.

Fig. 2 is a schematic structural diagram of an image processing apparatus 200 according to an embodiment of the present disclosure. As shown in fig. 2, the apparatus may implement the method shown in fig. 1, and the apparatus may include:

an identifying module 210 for identifying image content of the N images;

the intercepting module 220 is configured to intercept document data images from the N images to obtain M intercepted images when the image content of the N images is identified to include document data;

and the output module 230 is configured to splice the M captured images and output a spliced file in an electronic document format.

Optionally, the image is a video frame image, and the apparatus further includes:

the first acquisition module is used for acquiring a target video and a mark point of a mark record in the target video;

and the determining module is used for determining the video frame image corresponding to the mark point in the target video according to the mark point.

Optionally, the apparatus further comprises:

the input receiving module is used for receiving the mark input on the target video in the target video recording process or the target video playing process;

the response module is used for responding to the mark input and marking mark points on the corresponding video frame images in the target video;

wherein, each mark point corresponds to a video frame image.

Optionally, the output module 230 is further configured to:

and splicing the M intercepted images according to the first splicing sequence.

Optionally, the output module 230 is further configured to:

and splicing the M intercepted images according to a second splicing sequence.

Optionally, the intercept module 220 is further configured to:

Optionally, when any one of the boundaries of the document data included in the image content is identified, and an included angle exists between the boundary corresponding to the image where the document data is located, and the included angle is greater than an included angle threshold, the intercepting module 220 is further configured to:

and performing perspective correction clipping on the document data.

Optionally, the electronic document format includes any one of a presentation file format, a PDF format, a rich text format, a word format, and a text editing system document format.

Optionally, the documentation includes any one of a PPT document and a courseware document.

The image processing apparatus provided in this embodiment can execute the embodiments of the method described above, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 3, a schematic structural diagram of an electronic device 300 suitable for implementing embodiments of the present application is shown.

As shown in fig. 3, the electronic apparatus 300 includes a Central Processing Unit (CPU)301 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage section 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the apparatus 300 are also stored. The CPU 301, ROM 302, and RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 306 is also connected to bus 304.

The following components are connected to the I/O interface 305: an input portion 306 including a keyboard, a mouse, and the like; an output section 307 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 308 including a hard disk and the like; and a communication section 309 including a network interface card such as a LAN card, a modem, or the like. The communication section 309 performs communication processing via a network such as the internet. A drive 310 is also connected to the I/O interface 306 as needed. A removable medium 311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 310 as necessary, so that a computer program read out therefrom is mounted into the storage section 308 as necessary.

In particular, the process described above with reference to fig. 1 may be implemented as a computer software program, according to an embodiment of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the image processing method described above. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 309, and/or installed from the removable medium 311.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor. The names of these units or modules do not in some cases constitute a limitation of the unit or module itself.

As another aspect, the present application also provides a storage medium, which may be the storage medium contained in the foregoing device in the above embodiment; or may be a storage medium that exists separately and is not assembled into the device. The storage medium stores one or more programs that are used by one or more processors to execute the image processing methods described herein.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. An image processing method, characterized in that the method comprises:

identifying image content of the N images;

when the image content of the N images is identified to contain the document data, intercepting document data images from the N images to obtain M intercepted images;

2. The method of claim 1, wherein the image is a video frame image;

before the image content of the N images is identified, the method further includes:

acquiring a mark point of a mark record in a target video;

3. The method of claim 2, wherein before the obtaining the mark point for marking the record in the target video, further comprising:

receiving a marker input on the target video in the target video recording process or the target video playing process;

wherein each mark point corresponds to a video frame image.

4. The method according to any one of claims 1-3, wherein said stitching said M cropped images comprises:

acquiring the document data images corresponding to the M intercepted images, and playing time sequence in the target video;

and splicing the M intercepted images according to the first splicing sequence.

5. The method according to any one of claims 1-3, wherein said stitching said M cropped images comprises:

determining document page numbers of the document data images corresponding to the M intercepted images;

6. A method according to any one of claims 1 to 3, wherein the step of capturing the document material image from said N images to obtain M captured images comprises:

and when the same document material image exists in the document material images intercepted by the N images, taking one of the same document material images as the intercepted image.

7. The method according to any one of claims 1-3, wherein when any one boundary of the document material included in the image content is identified, and an included angle exists between the boundary corresponding to the image where the document material is located, and the included angle is greater than an included angle threshold value,

the process of intercepting the document material image from the N images comprises the following steps:

and performing perspective correction cutting on the document data.

8. The method according to any one of claims 1-3, wherein the electronic document format comprises any one of a presentation file format, a PDF format, a rich text format, a word format, and a text editing system document format.

9. The method according to any one of claims 1-3, wherein the documentation comprises any one of PPT documents and courseware documents.

10. An image processing apparatus, characterized by comprising:

the intercepting module is used for intercepting document material images from the N images to obtain M intercepted images when the document material is identified to be contained in the image content of the N images;

11. An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the image processing method according to any of claims 1-9 when executing the program.

12. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the image processing method according to any one of claims 1 to 9.