CN114926830B - Screen image recognition method, apparatus, device and computer readable medium - Google Patents

Screen image recognition method, apparatus, device and computer readable medium Download PDF

Info

Publication number
CN114926830B
CN114926830B CN202210599436.1A CN202210599436A CN114926830B CN 114926830 B CN114926830 B CN 114926830B CN 202210599436 A CN202210599436 A CN 202210599436A CN 114926830 B CN114926830 B CN 114926830B
Authority
CN
China
Prior art keywords
cut
cutting
screen
screen image
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210599436.1A
Other languages
Chinese (zh)
Other versions
CN114926830A (en
Inventor
车文彬
刘超
张超
郭丽娜
陈逸帆
张俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Shurui Data Technology Co ltd
Original Assignee
Nanjing Shurui Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Shurui Data Technology Co ltd filed Critical Nanjing Shurui Data Technology Co ltd
Priority to CN202210599436.1A priority Critical patent/CN114926830B/en
Publication of CN114926830A publication Critical patent/CN114926830A/en
Application granted granted Critical
Publication of CN114926830B publication Critical patent/CN114926830B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/635Overlay text, e.g. embedded captions in a TV program
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

Embodiments of the present disclosure disclose screen image recognition methods, apparatuses, devices, and computer-readable media. One embodiment of the method comprises the following steps: acquiring a screen image to be cut; cutting the screen images to be cut according to preset cutting sizes to obtain a cut screen image group; performing component detection processing by using a cutting image unit in each cutting screen image in the cutting screen image group to obtain a target component information set; performing character recognition on the screen image to be cut to obtain a character information set; and combining the target component information set and the text information set to obtain a screen picture identification information set. This embodiment may be more comprehensive of the components contained in the screen image.

Description

Screen image recognition method, apparatus, device and computer readable medium
Technical Field
Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a screen image recognition method, apparatus, device, and computer readable medium.
Background
Screen image recognition is a technique for recognizing components and characters in a screen image. Currently, in identifying a screen image, the following methods are generally adopted: the screen image is recognized as a whole.
However, when the screen image is recognized in the above manner, there are often the following technical problems:
the first, the limited detection category of the assembly, can't discern all assemblies included in the screen image;
second, the definition requirement on the screen image is higher, and when the definition of the screen image is lower, the accuracy of character recognition is reduced.
Disclosure of Invention
The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Some embodiments of the present disclosure propose screen image recognition methods, apparatuses, devices, and computer-readable media to solve one or more of the technical problems mentioned in the background section above.
In a first aspect, some embodiments of the present disclosure provide a screen image recognition method, the method comprising: acquiring a screen image to be cut; cutting the screen images to be cut according to preset cutting sizes to obtain a cut screen image group, wherein the cut screen images in the cut screen image group consist of cut image units with corresponding cutting sizes; performing component detection processing by using the cutting image units in each cutting screen image in the cutting screen image group to obtain a target component information set; performing character recognition on the screen image to be cut to obtain a character information set; and carrying out combination processing on the target component information set and the text information set to obtain a screen picture identification information set.
Optionally, the performing component detection processing by using a cut image unit in each cut screen image in the cut screen image group to obtain a target component information set includes:
the following component detection steps are performed for each of the cut screen images using a set of preset component detection frame information:
determining whether the cutting image unit meets a preset condition, wherein the preset condition is that a probability value of a central point of the cutting image unit as a component central point is larger than a preset probability value;
and responding to the fact that the cutting image unit meets the preset condition, and carrying out component detection by taking the cutting unit as the center by utilizing the component detection frame information set to obtain component information.
Optionally, the component detection frame information in the component detection frame information set includes a component detection frame and a component detection frame type; and
and in response to determining that the cutting image unit meets the preset condition, performing component detection with the cutting unit as a center by using the component detection frame information set to obtain component information, including:
determining the component detection frame type in the component detection frame information matched with the detected component in the component detection frame information set as the component type of the detected component;
determining a center point of the cut image unit as a component center point of the detected component, and determining coordinates of the component center point and a component size detected by the component as a component position of the detected component;
and combining the component type and the component position to obtain the component information of the detected component.
Optionally, the processing of component detection is performed on the cut image unit in each cut screen image in the cut screen image group to obtain a target component information set, and the method further includes:
and performing de-duplication processing on each piece of obtained component information, and determining the de-duplicated component information as target component information to obtain a target component information set.
In a second aspect, some embodiments of the present disclosure provide a screen image recognition apparatus, the apparatus comprising: an acquisition unit configured to acquire a screen image to be cut; the cutting unit is configured to cut the screen images to be cut according to preset cutting sizes to obtain a cut screen image group, wherein the cut screen images in the cut screen image group are composed of cutting image units with corresponding cutting sizes; the component detection unit is configured to perform component detection processing by using the cutting image units in each cutting screen image in the cutting screen image group to obtain a target component information set; the character recognition unit is configured to perform character recognition on the screen image to be cut to obtain a character information set; and the combining unit is configured to perform combination processing on the target component information set and the text information set to obtain a screen picture identification information set.
In a third aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors causes the one or more processors to implement the method described in any of the implementations of the first aspect above.
In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.
The above embodiments of the present disclosure have the following advantageous effects: by the screen image recognition method of some embodiments of the present disclosure, components contained in a screen image can be more comprehensively included. Specifically, the reason for the incomplete component detection is that: component detection categories are limited. Based on this, the screen image recognition method of some embodiments of the present disclosure first acquires a screen image to be cut. And then, cutting the screen images to be cut according to preset cutting sizes to obtain a cut screen image group, wherein the cut screen images in the cut screen image group consist of cut image units with corresponding cutting sizes. Therefore, the screen image to be cut is cut and integrated into parts, so that subsequent component detection and character recognition are facilitated. And then, performing component detection processing by using the cutting image units in each cutting screen image in the cutting screen image group to obtain a target component information set. Therefore, the component detection is carried out on the cutting image units with different cutting sizes, the comprehensiveness of the component detection can be improved, and the omission of the component detection is avoided. And then, carrying out character recognition on the screen image to be cut to obtain a character information set. And finally, carrying out combination processing on the target component information set and the text information set to obtain a screen picture identification information set. Thus, the components contained in the screen image can be more comprehensively contained.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
FIG. 1 is a flow chart of some embodiments of a screen image recognition method according to the present disclosure;
FIG. 2 is a schematic structural view of some embodiments of a screen image recognition device of the present disclosure;
fig. 3 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Referring now to FIG. 1, a flow 100 of some embodiments of screen image recognition methods according to the present disclosure is shown. The screen image identification method comprises the following steps:
step 101, obtaining a screen image to be cut.
In some embodiments, the execution subject of the screen image recognition method may acquire the screen image to be cut through a wired connection manner or a wireless connection manner. The screen image to be cut may be a screen image cut from a display screen.
And 102, cutting the screen images to be cut according to preset cutting sizes to obtain a cut screen image group.
In some embodiments, the executing body may perform a cutting process on the screen image to be cut according to preset cutting sizes, to obtain a cut screen image group. The cut screen image in the cut screen image group may be composed of cut image units corresponding to the cut size. The respective cut sizes may represent the number of cuts for the length and width of the screen image to be cut.
As an example, the individual cut sizes described above may include 13 x 13, 26 x 26 and 52 x 52. The 13 x 13 may represent that the screen image to be cut is cut into 13 x 13 square cutting image units with the same size and the same length and width. 26×26 may represent cutting the screen image to be cut into 13×13 square cutting image units with the same size and the same length and width. 52×52 may represent that the screen image to be cut is cut into 52×52 square cutting image units with the same size and the same length and width.
In some optional implementations of some embodiments, the executing body performs a cutting process on the screen image to be cut according to preset cutting sizes to obtain a cut screen image group, and may include the following steps:
and in the first step, in response to determining that the length and width of the screen images to be cut are unequal, performing equal-length processing on the screen images to be cut to obtain the screen images to be cut with equal length and width.
The equal-length processing may be to splice blank areas on short sides of the screen image to be cut, so that lengths and widths of the screen image to be cut after the blank areas are spliced are equal.
And secondly, cutting the screen images to be cut with equal length and width to obtain a cut screen image group.
And step 103, performing component detection processing by using the cutting image units in each cutting screen image in the cutting screen image group to obtain a target component information set.
In some embodiments, the executing body performs the component detection processing by using the cut image unit in each cut screen image in the cut screen image group to obtain the target component information set, and may include the following steps:
the first step, using a preset component detection frame information set, for each cut image unit in the cut screen image, performs the following component detection steps:
determining whether the cut image unit meets a preset condition, wherein the preset condition may be that a probability value of the center point of the cut image unit as a component center point is greater than a preset probability value. The component detection frame information in the component detection frame information set may include a component detection frame. The YOLO (You Only Look Once) model may be used to determine a probability value for the center point of the cut image element as the component center point. In practice, the preset probability value may be adjusted according to practical application needs, which is not limited herein. The component may be a page element such as a button, a text box, etc. displayed in the screen image to be cut.
And secondly, in response to determining that the cutting image unit meets the preset condition, performing component detection by using the component detection frame information set and taking the cutting unit as a center to obtain component information. The size of the component detection frame may be used as component information of the component satisfying the above-described preset condition.
And secondly, performing de-duplication processing on the obtained component information, and determining the de-duplicated component information as target component information to obtain a target component information set. Wherein the above component information may be deduplicated using an NMS (non maximum suppression, non-maximal suppression) algorithm.
In some optional implementations of some embodiments, the component detection box information in the component detection box information set described above may include a component detection box and a component detection box type. The executing body may perform component detection with the cutting unit as a center by using the component detection frame information set in response to determining that the cutting image unit satisfies the preset condition, to obtain component information, and may include the steps of:
and determining the component detection frame type in the component detection frame information matched with the detected component in the component detection frame information set as the component type of the detected component.
And a second step of determining the center point of the cutting image unit as the component center point of the detected component, and determining the coordinates of the component center point and the size of the component detected by the component detection as the component position of the detected component.
And thirdly, combining the component type and the component position to obtain the component information of the detected component.
And 104, performing character recognition on the screen image to be cut to obtain a character information set.
In some embodiments, the performing body performs text recognition on the screen image to be cut to obtain a text information set, and may include the following steps:
the first step, the coordinates of each text region in the screen image to be cut are determined, and a text region coordinate set is obtained. Wherein, the coordinates of each text region can be determined by utilizing a sliding window algorithm (Sliding Window Algorithm) or a candidate region algorithm (Region Proposal Algorithms) and the like.
And secondly, cutting the screen image to be cut according to the text region coordinate set to obtain a text region image set.
And cutting out the region corresponding to each text region coordinate set in the screen image to be cut, so as to obtain a text region image set.
And thirdly, performing character recognition on each character area image in the character area image set to obtain a character information set.
In some optional implementations of some embodiments, the executing body performs text recognition on each text region image in the text region image set to obtain a text information set, and may include the following steps:
the first step is to input the above character area image into a pre-trained character recognition model to obtain character content.
And a second step of combining the text content with the center point coordinates of each text region coordinate in the text region coordinate set corresponding to the text region image to obtain text information.
Optionally, the text recognition model may include: convolution layer, loop layer and transcription layer. The executing body inputs the text region image into a pre-trained text recognition model to obtain text content, and the executing body may include the following steps:
and step one, inputting the character area image into the convolution layer to obtain a characteristic sequence.
And secondly, inputting the characteristic sequence into the circulating layer to obtain the label distribution of the characteristic sequence.
And thirdly, inputting the label distribution into the transcription layer to obtain the text content.
The above steps are taken as an invention point of the embodiment of the present disclosure, and solve the second technical problem mentioned in the background art, namely "the accuracy of character recognition is reduced". Factors that cause the reduction of the accuracy of character recognition are often as follows: the definition requirement on the screen image is higher, and when the definition of the screen image is lower, the accuracy of character recognition can be reduced. If the above factors are solved, the effect of improving the character recognition accuracy can be achieved. In order to achieve the effect, the method and the device firstly determine the coordinates of each text region in the screen image to be cut, and then cut the screen image to be cut to obtain a text region image set. Therefore, the range of character recognition can be limited, character recognition in a larger range is avoided, the influence of image definition on character recognition is avoided to a certain extent, and further, the accuracy of character recognition is improved.
And 105, combining the target component information set and the text information set to obtain a set.
In some embodiments, the execution body may perform a combination process on the target component information set and the text information set to obtain a screen image identification information set. The target component information in the target component information set and the text information in the text information set can be used as screen picture identification information to obtain a screen picture identification information set.
In some alternative implementations of some embodiments, the screen identifying information in the set of screen identifying information may include a component type and a component location, or a text content and a text location. The execution body may further send the set of screen image identification information and the screen image to be cut to a target terminal for display.
The above embodiments of the present disclosure have the following advantageous effects: by the screen image recognition method of some embodiments of the present disclosure, components contained in a screen image can be more comprehensively included. Specifically, the reason for the incomplete component detection is that: component detection categories are limited. Based on this, the screen image recognition method of some embodiments of the present disclosure first acquires a screen image to be cut. And then, cutting the screen images to be cut according to preset cutting sizes to obtain a cut screen image group, wherein the cut screen images in the cut screen image group consist of cut image units with corresponding cutting sizes. Therefore, the screen image to be cut is cut and integrated into parts, so that subsequent component detection and character recognition are facilitated. And then, performing component detection processing by using the cutting image units in each cutting screen image in the cutting screen image group to obtain a target component information set. Therefore, the component detection is carried out on the cutting image units with different cutting sizes, the comprehensiveness of the component detection can be improved, and the omission of the component detection is avoided. And then, carrying out character recognition on the screen image to be cut to obtain a character information set. And finally, carrying out combination processing on the target component information set and the text information set to obtain a screen picture identification information set. Thus, the components contained in the screen image can be more comprehensively contained.
With further reference to fig. 2, as an implementation of the method shown in the above figures, the present disclosure provides some embodiments of a screen image recognition apparatus, which correspond to those method embodiments shown in fig. 2, and which are particularly applicable in various electronic devices.
As shown in fig. 2, the screen image recognition apparatus 200 of some embodiments includes: an acquisition unit 201, a cutting unit 202, a component detection unit 203, a word recognition unit 204, and a combination unit 205. Wherein the acquisition unit 201 is configured to acquire a screen image to be cut; a cutting unit 202 configured to perform cutting processing on the screen images to be cut according to preset cutting sizes to obtain a cut screen image group, wherein the cut screen images in the cut screen image group are composed of cut image units with corresponding cutting sizes; a component detecting unit 203 configured to perform component detection processing by using a cut image unit in each cut screen image in the above-mentioned cut screen image group, to obtain a target component information set; the text recognition unit 204 is configured to perform text recognition on the screen image to be cut to obtain a text information set; and a combining unit 205 configured to perform a combination process on the target component information set and the text information set to obtain a screen image identification information set.
It will be appreciated that the elements described in the apparatus 200 correspond to the various steps in the method described with reference to fig. 1. Thus, the operations, features and resulting benefits described above for the method are equally applicable to the apparatus 200 and the units contained therein, and are not described in detail herein.
Referring now to fig. 3, a schematic diagram of an electronic device 300 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 3 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.
As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various suitable actions and processes in accordance with a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 3 may represent one device or a plurality of devices as needed.
In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications device 309, or from storage device 308, or from ROM 302. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.
It should be noted that, the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a screen image to be cut; cutting the screen images to be cut according to preset cutting sizes to obtain a cut screen image group, wherein the cut screen images in the cut screen image group consist of cut image units with corresponding cutting sizes; performing component detection processing by using the cutting image units in each cutting screen image in the cutting screen image group to obtain a target component information set; performing character recognition on the screen image to be cut to obtain a character information set; and carrying out combination processing on the target component information set and the text information set to obtain a screen picture identification information set.
Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes an acquisition unit, a cutting unit, a component detection unit, a word recognition unit, and a combination unit. The names of these units do not constitute a limitation on the unit itself in some cases, and for example, the acquisition unit may also be described as "a unit that acquires a screen image to be cut".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

Claims (9)

1. A screen image recognition method, comprising:
acquiring a screen image to be cut;
cutting the screen images to be cut according to preset cutting sizes to obtain a cut screen image group, wherein the cut screen images in the cut screen image group consist of cut image units with corresponding cutting sizes;
performing component detection processing by using a cutting image unit in each cutting screen image in the cutting screen image group to obtain a target component information set;
performing character recognition on the screen image to be cut to obtain a character information set;
and carrying out combination processing on the target component information set and the text information set to obtain a screen picture identification information set.
2. The method of claim 1, wherein the screen identifying information in the set of screen identifying information includes a component type and a component location, or a text content and a text location; and
the method further comprises the steps of:
and sending the screen picture identification information set and the screen picture to be cut to a target terminal for display.
3. The method of claim 1, wherein the performing text recognition on the screen image to be cut to obtain a text information set includes:
determining coordinates of each text region in the screen image to be cut to obtain a text region coordinate set;
cutting the screen image to be cut according to the text region coordinate set to obtain a text region image set;
and carrying out character recognition on each character area image in the character area image set to obtain a character information set.
4. The method of claim 3, wherein said performing text recognition on each text region image in the text region image set to obtain a text information set comprises:
inputting the text region image into a pre-trained text recognition model to obtain text content;
and combining the text content with the center point coordinates of each text region coordinate in the text region coordinate set corresponding to the text region image to obtain text information.
5. The method of claim 4, wherein the word recognition model comprises: a convolution layer, a loop layer, and a transcription layer; and
inputting the text region image into a pre-trained text recognition model to obtain text contents, wherein the text contents comprise:
inputting the character area image into the convolution layer to obtain a characteristic sequence;
inputting the characteristic sequence into the circulating layer to obtain label distribution of the characteristic sequence;
and inputting the label distribution into the transcription layer to obtain the text content.
6. The method according to claim 1, wherein the cutting processing is performed on the screen images to be cut according to preset cutting sizes, so as to obtain a cut screen image group, including:
in response to determining that the length and width of the screen images to be cut are unequal, performing equal-length processing on the screen images to be cut to obtain the screen images to be cut with equal length and width;
and cutting the screen images to be cut with equal length and width to obtain a cut screen image group.
7. A screen image recognition apparatus comprising:
an acquisition unit configured to acquire a screen image to be cut;
the cutting unit is configured to cut the screen images to be cut according to preset cutting sizes to obtain a cutting screen image group, wherein the cutting screen images in the cutting screen image group are composed of cutting image units with corresponding cutting sizes;
the component detection unit is configured to perform component detection processing by using the cutting image units in each cutting screen image in the cutting screen image group to obtain a target component information set;
the character recognition unit is configured to perform character recognition on the screen image to be cut to obtain a character information set;
and the combining unit is configured to perform combination processing on the target component information set and the text information set to obtain a screen picture identification information set.
8. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.
9. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-6.
CN202210599436.1A 2022-05-30 2022-05-30 Screen image recognition method, apparatus, device and computer readable medium Active CN114926830B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210599436.1A CN114926830B (en) 2022-05-30 2022-05-30 Screen image recognition method, apparatus, device and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210599436.1A CN114926830B (en) 2022-05-30 2022-05-30 Screen image recognition method, apparatus, device and computer readable medium

Publications (2)

Publication Number Publication Date
CN114926830A CN114926830A (en) 2022-08-19
CN114926830B true CN114926830B (en) 2023-09-12

Family

ID=82811845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210599436.1A Active CN114926830B (en) 2022-05-30 2022-05-30 Screen image recognition method, apparatus, device and computer readable medium

Country Status (1)

Country Link
CN (1) CN114926830B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10635413B1 (en) * 2018-12-05 2020-04-28 Bank Of America Corporation System for transforming using interface image segments and constructing user interface objects
CN111652266A (en) * 2020-04-17 2020-09-11 北京三快在线科技有限公司 User interface component identification method and device, electronic equipment and storage medium
CN111652208A (en) * 2020-04-17 2020-09-11 北京三快在线科技有限公司 User interface component identification method and device, electronic equipment and storage medium
KR102179552B1 (en) * 2019-05-15 2020-11-17 주식회사 한컴위드 Apparatus and method for collecting evidence based on ocr
CN112631586A (en) * 2020-12-24 2021-04-09 软通动力信息技术(集团)股份有限公司 Application development method and device, electronic equipment and storage medium
CN113377356A (en) * 2021-06-11 2021-09-10 四川大学 Method, device, equipment and medium for generating user interface prototype code
US11221833B1 (en) * 2020-03-18 2022-01-11 Amazon Technologies, Inc. Automated object detection for user interface generation
CN114205365A (en) * 2020-08-31 2022-03-18 华为技术有限公司 Application interface migration system and method and related equipment
CN114332118A (en) * 2021-12-23 2022-04-12 北京达佳互联信息技术有限公司 Image processing method, device, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8762873B2 (en) * 2009-10-26 2014-06-24 Hewlett-Packard Development Company, L.P. Graphical user interface component identification
US10489126B2 (en) * 2018-02-12 2019-11-26 Oracle International Corporation Automated code generation
US11487973B2 (en) * 2019-07-19 2022-11-01 UiPath, Inc. Retraining a computer vision model for robotic process automation

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10635413B1 (en) * 2018-12-05 2020-04-28 Bank Of America Corporation System for transforming using interface image segments and constructing user interface objects
KR102179552B1 (en) * 2019-05-15 2020-11-17 주식회사 한컴위드 Apparatus and method for collecting evidence based on ocr
US11221833B1 (en) * 2020-03-18 2022-01-11 Amazon Technologies, Inc. Automated object detection for user interface generation
CN111652266A (en) * 2020-04-17 2020-09-11 北京三快在线科技有限公司 User interface component identification method and device, electronic equipment and storage medium
CN111652208A (en) * 2020-04-17 2020-09-11 北京三快在线科技有限公司 User interface component identification method and device, electronic equipment and storage medium
CN114205365A (en) * 2020-08-31 2022-03-18 华为技术有限公司 Application interface migration system and method and related equipment
CN112631586A (en) * 2020-12-24 2021-04-09 软通动力信息技术(集团)股份有限公司 Application development method and device, electronic equipment and storage medium
CN113377356A (en) * 2021-06-11 2021-09-10 四川大学 Method, device, equipment and medium for generating user interface prototype code
CN114332118A (en) * 2021-12-23 2022-04-12 北京达佳互联信息技术有限公司 Image processing method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
组件化在应用界面设计中的优势和表现;胡超;;信息通信(第03期);290-292 *

Also Published As

Publication number Publication date
CN114926830A (en) 2022-08-19

Similar Documents

Publication Publication Date Title
US20230394671A1 (en) Image segmentation method and apparatus, and device, and storage medium
CN113313064B (en) Character recognition method and device, readable medium and electronic equipment
KR102002024B1 (en) Method for processing labeling of object and object management server
CN111784712B (en) Image processing method, device, equipment and computer readable medium
CN111783777B (en) Image processing method, apparatus, electronic device, and computer readable medium
CN115272182B (en) Lane line detection method, lane line detection device, electronic equipment and computer readable medium
CN113033552B (en) Text recognition method and device and electronic equipment
CN112183388B (en) Image processing method, device, equipment and medium
CN112418054B (en) Image processing method, apparatus, electronic device, and computer readable medium
CN116704473B (en) Obstacle information detection method, obstacle information detection device, electronic device, and computer-readable medium
CN110852242A (en) Watermark identification method, device, equipment and storage medium based on multi-scale network
CN118262188A (en) Object detection model training method, object detection information generating method and device
CN114926830B (en) Screen image recognition method, apparatus, device and computer readable medium
CN113807056B (en) Document name sequence error correction method, device and equipment
CN111680754B (en) Image classification method, device, electronic equipment and computer readable storage medium
CN112528970A (en) Guideboard detection method, device, equipment and computer readable medium
CN112070034A (en) Image recognition method and device, electronic equipment and computer readable medium
CN111797931A (en) Image processing method, image processing network training method, device and equipment
CN114359673B (en) Small sample smoke detection method, device and equipment based on metric learning
CN110991312A (en) Method, apparatus, electronic device, and medium for generating detection information
CN117824974B (en) Switch drop test method, device, electronic equipment and computer readable medium
CN114613355B (en) Video processing method and device, readable medium and electronic equipment
CN113256659B (en) Picture processing method and device and electronic equipment
CN115345931B (en) Object attitude key point information generation method and device, electronic equipment and medium
CN118229171B (en) Power equipment storage area information display method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant