CN116090415A - Method, device, computer device and storage medium for generating operation manual - Google Patents

Method, device, computer device and storage medium for generating operation manual Download PDF

Info

Publication number
CN116090415A
CN116090415A CN202310177428.2A CN202310177428A CN116090415A CN 116090415 A CN116090415 A CN 116090415A CN 202310177428 A CN202310177428 A CN 202310177428A CN 116090415 A CN116090415 A CN 116090415A
Authority
CN
China
Prior art keywords
image
information
target
content
triggering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310177428.2A
Other languages
Chinese (zh)
Inventor
芦萌
韩宏宇
张翼
王驰宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310177428.2A priority Critical patent/CN116090415A/en
Publication of CN116090415A publication Critical patent/CN116090415A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present application relates to a method, apparatus, computer device, storage medium and computer program product for generating an operation manual, and relates to the field of artificial intelligence technology, which can be used in the field of financial science and technology or other fields. The method comprises the following steps: monitoring triggering operation of a user in a display interface of a target application program, and generating an operation image corresponding to the triggering operation based on current interface information of the display interface; classifying the operation images by adopting an image classification algorithm to obtain operation types corresponding to the operation images; performing translation processing on the operation image by adopting a translation strategy matched with the operation type to obtain target translation content corresponding to the operation image; based on the target translation content, an operation manual of the target application is generated. The method can improve the generation efficiency of the operation manual.

Description

Method, device, computer device and storage medium for generating operation manual
Technical Field
The present application relates to the field of artificial intelligence technology, and in particular, to a method, an apparatus, a computer device, a storage medium, and a computer program product for generating an operation manual.
Background
The lifecycle of application development generally includes five basic stages of design, development, testing, acceptance, and promotion. With the popularization of mobile terminal application programs, the operation of mobile phone application programs in the market is also gradually complicated, so that an operation manual is often required to be provided when the application programs are popularized, and a user is guided to correctly use all functions of the application programs.
In the related art, an operation manual of an application program is generally manually written by a worker (e.g., a popularizer). Usually, a worker needs to be familiar with the operation process of an application program, and records each operation step manually to form an operation manual, so that the whole process consumes more time, and the problem of low efficiency exists.
Disclosure of Invention
In view of the foregoing, it is desirable to provide an operation manual generation method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve the efficiency of operation manual writing.
In a first aspect, the present application provides a method for generating an operation manual. The method comprises the following steps:
monitoring triggering operation of a user in a display interface of a target application program, and generating an operation image corresponding to the triggering operation based on current interface information of the display interface;
Classifying the operation images by adopting an image classification algorithm to obtain operation types corresponding to the operation images;
performing translation processing on the operation image by adopting a translation strategy matched with the operation type to obtain target translation content corresponding to the operation image;
based on the target translation content, an operation manual for the target application is generated.
In one embodiment, the generating, based on the current interface information of the display interface, the operation image corresponding to the triggering operation includes:
determining trigger information corresponding to the trigger operation in the current interface information of the display interface; the trigger information comprises at least one of trigger object position information and trigger operation position information;
and adding an image mark in the current interface information of the display interface based on the trigger information to obtain an operation image corresponding to the trigger operation.
In one embodiment, after generating the operation image corresponding to the triggering operation based on the current interface information of the display interface, the method further includes:
generating image index information of an operation image corresponding to the triggering operation based on the triggering sequence of the triggering operation;
The generating an operation manual for the target application based on the target translation content includes:
when a plurality of operation images are provided, sorting target translation contents corresponding to the operation images according to image index information of the operation images, and generating an operation manual of the target application program based on the sorted target translation contents.
In one embodiment, the translating the operation image by using a translation policy matched with the operation type to obtain the target translation content corresponding to the operation image includes:
extracting content information corresponding to the triggering operation from the operation image based on a content extraction strategy matched with the operation type; the content information comprises triggering object content information corresponding to the triggering operation;
and generating target translation content corresponding to the operation image based on the translation content template information matched with the operation type and the content information corresponding to the triggering operation.
In one embodiment, the extracting, based on the content extraction policy matched with the operation type, content information corresponding to the triggering operation from the operation image includes:
When the operation type is an input operation type, inputting the operation image into an image recognition model to obtain a target input frame name image and a target input frame image which correspond to the triggering operation in the operation image;
and respectively carrying out character recognition on the target input frame name image and the target input frame image to obtain an input frame name and input content information.
In one embodiment, the extracting, based on the content extraction policy matched with the operation type, content information corresponding to the triggering operation from the operation image includes:
identifying clicking text information corresponding to the triggering operation in the operation image under the condition that the operation type is a text clicking operation type;
and extracting a click icon image corresponding to the triggering operation in the operation image under the condition that the operation type is the icon click operation type.
In one embodiment, the method further comprises:
displaying a chapter information input interface;
responding to the input completion operation of a user in the chapter information input interface, and acquiring a chapter image containing chapter information;
identifying chapter information contained in the chapter image to obtain target translation content corresponding to the chapter image;
The generating an operation manual for the target application based on the target translation content includes:
and generating an operation manual of the target application program based on the target translation content corresponding to the operation image and the target translation content corresponding to the chapter image.
In a second aspect, the present application further provides an apparatus for generating an operation manual. The device comprises:
the monitoring module is used for monitoring triggering operation of a user in a display interface of a target application program and generating an operation image corresponding to the triggering operation based on current interface information of the display interface;
the classifying module is used for classifying the operation images by adopting an image classifying algorithm to obtain operation types corresponding to the operation images;
the translation module is used for translating the operation image by adopting a translation strategy matched with the operation type to obtain target translation content corresponding to the operation image;
and the first generation module is used for generating an operation manual of the target application program based on the target translation content.
In one embodiment, the monitoring module is specifically configured to:
determining trigger information corresponding to the trigger operation in the current interface information of the display interface; the trigger information comprises at least one of trigger object position information and trigger operation position information; and adding an image mark in the current interface information of the display interface based on the trigger information to obtain an operation image corresponding to the trigger operation.
In one embodiment, the apparatus further comprises:
the second generation module is used for generating image index information of an operation image corresponding to the trigger operation based on the trigger sequence of the trigger operation;
the first generation module is specifically configured to:
when a plurality of operation images are provided, sorting target translation contents corresponding to the operation images according to image index information of the operation images, and generating an operation manual of the target application program based on the sorted target translation contents.
In one embodiment, the translation module is specifically configured to:
extracting content information corresponding to the triggering operation from the operation image based on a content extraction strategy matched with the operation type; the content information comprises triggering object content information corresponding to the triggering operation; and generating target translation content corresponding to the operation image based on the translation content template information matched with the operation type and the content information corresponding to the triggering operation.
In one embodiment, the translation module is specifically configured to:
when the operation type is an input operation type, inputting the operation image into an image recognition model to obtain a target input frame name image and a target input frame image which correspond to the triggering operation in the operation image; and respectively carrying out character recognition on the target input frame name image and the target input frame image to obtain an input frame name and input content information.
In one embodiment, the translation module is specifically configured to:
identifying clicking text information corresponding to the triggering operation in the operation image under the condition that the operation type is a text clicking operation type; and extracting a click icon image corresponding to the triggering operation in the operation image under the condition that the operation type is the icon click operation type.
In one embodiment, the apparatus further comprises:
the display module is used for displaying the chapter information input interface;
the acquisition module is used for responding to the input completion operation of the user in the chapter information input interface to acquire chapter images containing chapter information;
the identification module is used for identifying chapter information contained in the chapter image and obtaining target translation content corresponding to the chapter image;
the first generation module is specifically configured to:
and generating an operation manual of the target application program based on the target translation content corresponding to the operation image and the target translation content corresponding to the chapter image.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method of the first aspect when the processor executes the computer program.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of the first aspect.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of the first aspect.
According to the method, the device, the computer equipment, the storage medium and the computer program product for generating the operation manual, the operation image corresponding to the triggering operation is generated in the process of operating the application program by a user, then the translation strategy matched with the operation type is adopted for translating the operation image to obtain translation content, so that the operation manual is generated based on the translation content, the automatic generation of the operation manual when the user operates the application program can be realized, the process of manually recording operation steps by the user is avoided, and the purposes of saving the writing time of the operation manual and improving the writing efficiency of the operation manual are achieved.
In addition, the method determines the corresponding operation type by classifying the operation image, can be realized only based on the information of the front-end display interface of the application program, and does not need to access the operation information in the back-end bottom data of the application program, so that the occupation of the back-end resource of the application program can be avoided, the performance of the application program is not influenced in the process of generating an operation manual, and the method is also suitable for the scene in which the back-end data of the application program cannot be accessed, and has wider application scene.
Drawings
FIG. 1 is a flow diagram of a method of generating an operation manual in one embodiment;
FIG. 2 is a flow diagram of generating an operational image in one embodiment;
FIG. 3 is a flow diagram of obtaining target translation content in one embodiment;
FIG. 4 is a flow chart of a method of generating an operation manual in another embodiment;
FIG. 5 is a block diagram showing the construction of an operation manual generation device in one embodiment;
fig. 6 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
First, before the technical solution of the embodiments of the present application is specifically described, a description is first given of a technical background or a technical evolution context on which the embodiments of the present application are based. With the popularization of mobile terminal applications, the operation of mobile phone applications on the market is also becoming more complex, so an detailed operation manual is often required to be provided when the applications are promoted, so as to guide users to correctly use various functions of the applications. In the related art, an operation manual of an application program is generally manually written by a worker (e.g., a popularizer). Usually, a worker needs to be familiar with the operation process of an application program, and manually record each operation step, typically record a text description of the operation step (such as clicking a button/control), and an interface screenshot corresponding to the operation step, and then form each operation step into an operation manual. The whole process consumes more time cost and labor cost, and has the problem of lower efficiency.
Based on the background, the applicant provides a generation method of the operation manual of the application through long-term research and development and experimental verification, and the method can automatically generate the operation manual when the user operates the application by adopting a translation strategy matched with an operation type to translate the operation image to obtain translation content in the process of operating the application by the user (such as a tester of an acceptance software and the like), so that the process of manually recording operation steps by the user is avoided, and the purposes of saving the writing time of the operation manual and improving the writing efficiency of the operation manual are achieved. In addition, the method determines the corresponding operation type by classifying the operation image, can be realized only based on the information of the front-end display interface of the application program, and does not need to access the operation information in the back-end bottom data of the application program, so that the occupation of the back-end resource of the application program can be avoided, the performance of the application program is not influenced in the process of generating an operation manual, and the method is also suitable for the scene in which the back-end data of the application program cannot be accessed, and has wider application scene. In addition, the applicant has made a great deal of creative effort to find out the technical problems of the present application and to introduce the technical solutions of the following embodiments.
In one embodiment, as shown in fig. 1, a method for generating an operation manual is provided, where the method is applicable to a terminal, and it is understood that the method may also be applied to a server, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. The terminal may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, portable wearable devices, and the like. In this embodiment, the method includes the steps of:
step 101, monitoring triggering operation of a user in a display interface of a target application program, and generating an operation image corresponding to the triggering operation based on current interface information of the display interface.
Wherein the target application refers to an application to be generated with an operation manual. The terminal can be provided with a target application program and can display a display interface of the target application program.
In implementation, the terminal may monitor a triggering operation of a user in a display interface of the target application, and when any triggering operation of the user is monitored, may generate an operation image corresponding to the triggering operation based on current interface information of the display interface. The operation image is used for classifying the subsequent operation types and performing translation processing, and can comprise the image of the triggering object corresponding to the triggering operation in the current display interface, namely the local image of the triggering object in the current interface, or the operation image can be a complete image of the current interface, and the selection range of the operation image can be set according to actual requirements, for example, different intercepting modes of different operation images are set for different types of triggering operations. The trigger object may be an icon, a text button, an input box, or a display page in the interface, and the trigger objects corresponding to different trigger operations are different. For example, the trigger object corresponding to the clicking operation may be an icon or a text button, the trigger object of the input operation may be an input box, and the trigger object corresponding to the swiping operation may be a display page.
And 102, classifying the operation images by adopting an image classification algorithm to obtain operation types corresponding to the operation images.
In implementation, after obtaining an operation image corresponding to the triggering operation, the terminal may use an image classification algorithm to classify the operation image so as to obtain an operation type corresponding to the operation image. The operation type may include a click operation, an input operation, a swipe operation, and the like. Optionally, the clicking operation may be further specifically classified into a text clicking operation, an icon clicking operation, and the like, and the specific classification granularity may be set according to needs. Since the operation images of the operation types have different feature information, such as the operation images of the click operation types generally include image information of controls such as click buttons, and the operation images of the input operation types generally include image information of input boxes (square boxes, rounded boxes, straight boxes, and the like), the operation images can be classified by using an image classification algorithm. For example, an image classification model may be previously constructed based on an image classification algorithm, and a training sample image labeled with an operation type is used to train the image classification model, so that the trained image classification model may be used to classify the operation image to obtain the operation type corresponding to the operation image.
And step 103, performing translation processing on the operation image by adopting a translation strategy matched with the operation type to obtain target translation content corresponding to the operation image.
In implementation, after classifying the operation images, the terminal may perform translation processing on the operation images by using a translation policy matched with the operation type of the operation images, so as to obtain target translation contents corresponding to the operation images. The translation strategies corresponding to different operation types are different, and the corresponding relation between the operation types and the translation strategies can be pre-established. The translation policy may be key information for identifying the operation image, such as content information, interface identification information, etc., and further may obtain the target translation content based on the key information and the template information. For example, for an operation image of a click operation type, content information (text or icon) of a click object may be identified, and the corresponding target translation content may be "click" an object ".
Step 104, generating an operation manual of the target application program based on the target translation content.
In implementation, the terminal may generate the operation manual according to a preset document generation format by translating the target translation content corresponding to the operation image. It is to be understood that, if a plurality of operation images are involved, the translation contents corresponding to the operation images may be sequentially spliced in the operation order, and the operation manual may be generated in the document generation format. In addition, the operation manual can also contain operation images, namely, target translation contents corresponding to the operation images can be written into a document according to a preset format, and the operation images are inserted into the document to obtain the operation manual containing the text description information and the image information of the operation steps, so that the user can read and understand the operation manual conveniently.
According to the method for generating the operation manual, the operation image corresponding to the triggering operation is generated in the process of operating the application program by the user, and then the translation strategy matched with the operation type is adopted to conduct translation processing on the operation image to obtain translation content, so that the operation manual is generated based on the translation content, and therefore the operation manual can be automatically generated when the user operates the application program, the process of manually recording operation steps by the user is avoided, and the purposes of saving the writing time of the operation manual and improving the writing efficiency of the operation manual are achieved. In addition, the method determines the corresponding operation type by classifying the operation image, can be realized only based on the information of the front-end display interface of the application program, and does not need to access the operation information in the back-end bottom data of the application program, so that the occupation of the back-end resource of the application program can be avoided, the performance of the application program is not influenced in the process of generating an operation manual, and the method is also suitable for the scene in which the back-end data of the application program cannot be accessed, and has wider application scene.
In one embodiment, as shown in fig. 2, the process of generating an operation image in step 101 specifically includes the following steps:
Step 201, determining trigger information corresponding to the trigger operation in the current interface information of the display interface.
Wherein the trigger information includes at least one of trigger object position information and trigger operation position information.
In implementation, when the terminal monitors the triggering operation of the user, screen capturing processing can be performed to obtain a current interface image, and the terminal can determine the position information (triggering operation position information) of the triggering operation in the current interface, and further determine the triggering information based on the triggering operation position information. For example, for a click operation or an input operation, a corresponding trigger object (i.e., a graphical interface element corresponding to the trigger operation position in the current interface) may be determined based on the trigger operation position information, and then the position information of the trigger object may be used as trigger information. For the swipe operation, the terminal may use the initial position information and the swipe path information of the swipe as trigger information corresponding to the trigger operation.
Step 202, adding an image mark in the current interface information of the display interface based on the trigger information to obtain an operation image corresponding to the trigger operation.
In implementation, after determining the trigger information, the terminal may add an image tag to the current interface image according to the location information in the trigger information. For example, if the trigger information is trigger object position information, the terminal may add an image mark, such as adding a mark box, to the current interface image in the image area matching the trigger object position information, so as to mark the position of the trigger object (clicking a button or an input box, etc.). If the trigger information is trigger operation position information (such as initial position information and stroke path information of the stroke operation), the stroke initial position and the stroke path can be marked in the current interface image, so that an operation image corresponding to the trigger operation is obtained.
In this embodiment, when the triggering operation of the user is monitored, the current interface is captured and marked at the position of the triggering operation to obtain the operation image, so that the operation image can be classified based on the marking and the capturing content of the operation image, the problem that the performance of the application program is reduced due to the fact that the operation information is acquired from the bottom data at the rear end of the application program can be avoided, meanwhile, the classification accuracy can be ensured, the acquired operation image can be used as an image display part of each operation step in the operation manual, the operation steps can be displayed more intuitively and clearly, and the user can understand conveniently.
In one embodiment, after the operation image is generated in step 101, the method further includes a step of generating image index information, which specifically includes the following steps: and generating image index information of the operation image corresponding to the triggering operation based on the triggering sequence of the triggering operation. Correspondingly, the process of generating the operation manual in step 104 specifically includes the following steps: when a plurality of operation images are provided, the object translation contents corresponding to the operation images are sorted according to the image index information of the operation images, and an operation manual of the object application program is generated based on the sorted object translation contents.
In implementation, the number of trigger operations in the use process of the user is generally multiple, and accordingly multiple operation images can be obtained. Thus, the image index information corresponding to the triggering operation can be generated based on the triggering sequence of the triggering operation. The image index information can identify each operation image and reflect the relative sequence of each operation image. For example, the operation image obtained by the first trigger operation may be labeled "1", and the operation image obtained by the second trigger operation may be labeled "2", and "1" and "2" are the image index information of the operation image. After identifying the target translation content corresponding to each operation image, the terminal can sort the target translation content according to the image index information of each operation image so as to obtain the target translation content sorted according to the triggering sequence. And then, the sequenced target translation content can be written into the document one by one according to the sequence, so as to generate an operation manual with accurate operation step sequence.
Alternatively, the terminal may add information such as image index information, operation type, and target translation content of the operation image to the translation information table, and sort in the order reflected by the image index information. In one example, the translation information table is shown in table 1. The terminal can read the target translation contents in the translation information table one by one, and write the target translation contents into a document according to a text format matched with the operation type so as to generate an operation manual with accurate operation steps. In addition, before generating the operation manual, the terminal can display the translation information table to the user for modification and confirmation by the user, and then generate the operation manual based on the translation information table after confirmation by the user, so that the accuracy of the operation manual is further ensured.
Table 1 translation information table
Figure BDA0004101356300000101
Figure BDA0004101356300000111
It can be understood that, for the operation image of the type of the swipe operation, the terminal can identify the type of the swipe operation capable of reflecting the swipe direction according to the marks (i.e., the swipe initial position and the swipe route) added in the operation image, and can be divided into two major categories of a swipe with a progress bar and a swipe without a progress bar according to whether the swipe operation position includes a progress bar. The type of the stroking operation may be subdivided, for example, into a left stroking operation-no progress bar, a right stroking operation-no progress bar, an up stroking operation-no progress bar, a down stroking operation-no progress bar, a left stroking operation-progress bar, a right stroking operation-progress bar, an up stroking operation-progress bar, a down stroking operation-progress bar, etc., depending on the stroking direction and whether or not the progress bar is included, so that the translation content template information corresponding to the specific stroking operation type may be regarded as the target translation content. The translation content template information corresponding to the left swipe operation-no progress bar may be "swipe screen to left".
In this embodiment, the image index information is generated for each operation image according to the trigger sequence, so that the object translation contents corresponding to each operation image are arranged according to the image index information, and the operation manual with accurate operation sequence is obtained. Furthermore, based on this method, while ensuring that the order of the operation steps in the operation manual is accurate, it is possible to realize that the operation image generation step of step 101 and the subsequent steps (the image classification, translation and manual generation steps of steps 102 to 104) are executed separately by different terminals, and that the two terminals only need to interactively operate the images, the interactive data is simple, and the efficiency is high. For example, the triggering operation of the application program installed by the user at the terminal can be monitored through the first terminal (such as a mobile phone end), so that a plurality of operation images numbered according to the triggering sequence are obtained. Then, the first terminal may send each operation image to a second terminal (e.g., a computer end), and perform subsequent classification and translation processing on each operation image through the second terminal, and generate an operation manual with accurate operation steps based on index information of each operation image. This can be applied to a case where the first terminal cannot run a program such as a complicated image classification algorithm due to performance limitation.
In one embodiment, as shown in fig. 3, the process of obtaining the target translation content by the translation processing in step 103 specifically includes the following steps:
step 301, extracting content information corresponding to the triggering operation from the operation image based on the content extraction policy matched with the operation type.
The content information comprises triggering object content information corresponding to triggering operation. Content extraction strategies corresponding to different operation types can be preset, so that content information can be extracted from images of different operation types by adopting a matched strategy. For example, for a click operation type, the literal content of the clicked trigger object (e.g., literal button) may be extracted; for input operations, an input box identifier (e.g., name) and text content within the input box may be extracted.
Step 302, generating target translation content corresponding to the operation image based on the translation content template information matched with the operation type and the content information corresponding to the trigger operation.
In implementation, translation content template information corresponding to each operation type may be preset. The translated content template information may contain text reflecting the execution actions of the triggering operation so that after a user reads, the relevant actions may be referenced for proper triggering of the corresponding operation. For example, the translated content template information for the click operation may be "click" A ", where" A "may be replaced with the content information obtained in step 301. If the trigger object corresponding to the click operation is the "login" control, the content information can be extracted to be "login", and then "A" in the template information is replaced by "login", so that the target translation content is obtained to be "click" login ".
In this embodiment, by extracting content information corresponding to a trigger operation in an operation image, based on the translated content template information and the extracted content information, translated content corresponding to the operation image may be generated, where the translated content includes content information of an operation action and an operation object, so that a user may conveniently read and understand operation steps, thereby guiding the user to correctly use an application program.
In one embodiment, the process of extracting content information in step 301 specifically includes the following steps: under the condition that the operation type is the input operation type, inputting the operation image into the image recognition model to obtain a target input frame name image and a target input frame image which correspond to the triggering operation in the operation image; and respectively carrying out character recognition on the target input frame name image and the target input frame image to obtain the input frame name and the input content information.
In implementations, the operation types may include input operation types. For an operation image of an input operation type, the terminal may input the operation image to the image recognition model to further recognize a target input frame name image and a target input frame image corresponding to the trigger operation in the operation image. In practical applications, the input box may have various styles: square frame, round corner square frame (square with radian at four corners), straight line frame, etc. The input boxes are generally distributed adjacent to the names thereof, and the distribution of the input boxes is also divided into various types, such as the names of the input boxes are left, above, right, below, etc. Thus, training sample images containing different styles of input boxes and different name distributions can be used to train the image recognition model to improve recognition accuracy and robustness. The trained image recognition model may be used to recognize the input box image and the input box name image in the operation image (or at the marked position in the operation image). Then, the terminal can adopt an OCR (Optical Character Recognition ) model to perform text recognition on the input frame name image and the input frame image respectively, so as to obtain an input frame name and input content information, and the input frame name and the input content information are used as two extracted content information for replacing corresponding content in the translation content template information, so as to obtain target translation content. In one example, the translation content template information corresponding to the input operation type is "B" input in the "a" input box, where "a" may be replaced with the extracted input box name and "B" may be replaced with the extracted input content information, thereby obtaining the target translation content.
In this embodiment, for an operation image of an input operation type, an input frame image and an input frame name image corresponding to a trigger operation are identified through an image identification model, and then text identification is performed on the two images respectively, so that the input frame name and the input content corresponding to the trigger operation can be accurately identified, and further translation content including the input frame name and the input content is generated, so that a user can understand the operation step more clearly.
In one embodiment, the operation types include a text click operation type and an image click operation type, and the process of extracting the content information in the foregoing step 301 specifically includes the following steps: under the condition that the operation type is a text clicking operation type, clicking text information corresponding to triggering operation in an operation image is identified; and extracting a click icon image corresponding to the triggering operation in the operation image under the condition that the operation type is the icon click operation type.
In implementation, the text clicking operation type refers to an operation type that a clicked trigger object is a text control or a text button control, and the image clicking operation type refers to an operation type that a clicked trigger object is an icon control. For the operation image of the text clicking operation type, the terminal can directly identify text information contained in the triggering object (namely clicking text information corresponding to the triggering operation) as content information corresponding to the operation image. For the operation image of the icon clicking operation type, the terminal can acquire an image of the triggering object (namely, an image area where a mark is added in the operation image), namely, the icon clicking image corresponding to the triggering operation is used as content information corresponding to the operation image.
In this embodiment, for the clicking operation, the text clicking operation and the icon clicking operation may be specifically included, and for different clicking operation types, the text of the clicking object is used as content information or the icon is used as content information to replace corresponding content in the translated content template information, so as to generate an operation manual, thereby obtaining an operation manual which is matched with the actual operation and is easy to read, and being convenient for guiding the user to correctly use the application program.
In one embodiment, as shown in fig. 4, the method further includes a process of acquiring and translating the chapter images, and specifically includes the following steps:
step 401, displaying a chapter information entry interface.
In implementation, the operation manual can contain chapter information and corresponding operation steps under each chapter, so that the application program using process can be displayed with clear structure, and the user can conveniently read and understand the application program. To obtain chapter information, a chapter information entry plug-in may be installed at the terminal and the plug-in control (e.g., a control in the form of a hover icon) may be displayed in a display interface. The user can click the plugin control, namely, a chapter information input request is triggered, and the terminal can respond to the request to display a chapter information input interface.
Step 402, acquiring a chapter input page image containing chapter information in response to the completion of the user input operation in the chapter information input interface.
In implementations, the user can enter chapter information at a chapter information entry interface. The chapter information may include chapter numbers (e.g., first chapter), titles, chapter content descriptions, and the like. After entering the chapter information, the user can trigger an enter completion operation, such as clicking a complete button on a chapter information entry interface. The terminal can respond to the input completion operation to acquire the current interface screenshot and obtain a chapter image containing chapter information. It can be understood that after the user clicks the enter completion button, or after the user clicks the exit button, the terminal may exit the chapter information entry interface and display the interface of the target application program, so that the terminal may start to monitor the triggering operation of the user in the display interface of the target application program.
Step 403, identifying chapter information contained in the chapter image, and obtaining target translation content corresponding to the chapter image.
In implementation, the terminal may identify chapter information contained in the chapter image as target translation content corresponding to the chapter entry page image. Thus, an operation manual including chapter information and operation steps can be generated based on the target translation content corresponding to the chapter image and the target translation content corresponding to the operation image.
It can be understood that the operation manual may include a plurality of chapters, the user may enter chapter information one by one, and after each chapter information is entered, an operation step under the chapter is triggered on a display interface of the target application program, so that the terminal may generate image index information of each chapter image and each operation image according to the chapter information entry sequence and the triggering sequence of the triggering operation, to be used for identifying each image, and may reflect the sequence of the chapter images and the operation images. Then, the target translation contents may be ordered based on the image index information of each chapter image and the operation image to generate an operation manual in which the chapter order is accurate and the operation steps under each chapter are accurate. After the terminal acquires the chapter image and the operation image, an image classification algorithm may be used to classify each image in step 102, so as to determine the chapter image and the operation image, and the specific operation type of the operation image. Then, the terminal can adopt a translation strategy matched with the image category to carry out translation processing on each image to obtain corresponding target translation content, so as to generate an operation manual.
In this embodiment, the terminal may further display a chapter information entry interface, so that a user may enter chapter information, and after the user enters the chapter information, the terminal may obtain a chapter image containing the chapter information, and then may identify the chapter information contained in the chapter image as target translation content corresponding to the chapter image, so that an operation manual containing the chapter information and operation steps may be obtained based on the target translation content of the chapter image and the target translation content of the operation image, which is more convenient for the user to read.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides an operation manual generation device for realizing the generation method of the operation manual. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiment of the generating device of one or more operation manuals provided below may refer to the limitation of the generating method of the operation manual hereinabove, and will not be repeated here.
In one embodiment, as shown in fig. 5, there is provided an operation manual generation apparatus 500, including: a monitoring module 501, a classification module 502, a translation module 503, and a first generation module 504, wherein:
the monitoring module 501 is configured to monitor a triggering operation of a user in a display interface of a target application program, and generate an operation image corresponding to the triggering operation based on current interface information of the display interface.
The classifying module 502 is configured to perform classification processing on the operation image by using an image classifying algorithm, so as to obtain an operation type corresponding to the operation image.
The translation module 503 is configured to perform translation processing on the operation image by using a translation policy matched with the operation type, so as to obtain a target translation content corresponding to the operation image.
A first generation module 504 is configured to generate an operation manual of the target application program based on the target translation content.
In one embodiment, the monitoring module 501 is specifically configured to: determining trigger information corresponding to the trigger operation in the current interface information of the display interface; the trigger information includes at least one of trigger object position information and trigger operation position information; and adding an image mark in the current interface information of the display interface based on the trigger information to obtain an operation image corresponding to the trigger operation.
In one embodiment, the apparatus further includes a second generating module, configured to generate, based on a trigger sequence of the trigger operations, image index information of an operation image corresponding to the trigger operations. The corresponding first generation module 504 is specifically configured to: when a plurality of operation images are provided, the object translation contents corresponding to the operation images are sorted according to the image index information of the operation images, and an operation manual of the object application program is generated based on the sorted object translation contents.
In one embodiment, the translation module 503 is specifically configured to: extracting content information corresponding to the triggering operation from the operation image based on a content extraction strategy matched with the operation type; the content information comprises triggering object content information corresponding to triggering operation; the target translation content corresponding to the operation image is generated based on the translation content template information matching the operation type and the content information corresponding to the trigger operation.
In one embodiment, the translation module 503 is specifically configured to: under the condition that the operation type is the input operation type, inputting the operation image into the image recognition model to obtain a target input frame name image and a target input frame image which correspond to the triggering operation in the operation image; and respectively carrying out character recognition on the target input frame name image and the target input frame image to obtain the input frame name and the input content information.
In one embodiment, the translation module 503 is specifically configured to: under the condition that the operation type is a text clicking operation type, clicking text information corresponding to triggering operation in an operation image is identified; and extracting a click icon image corresponding to the triggering operation in the operation image under the condition that the operation type is the icon click operation type.
In one embodiment, the apparatus further comprises a display module, an acquisition module and an identification module, wherein:
and the display module is used for displaying the chapter information input interface.
And the acquisition module is used for responding to the input completion operation of the user in the chapter information input interface and acquiring the chapter image containing the chapter information.
And the identification module is used for identifying chapter information contained in the chapter image and obtaining target translation content corresponding to the chapter image.
Accordingly, the first generating module 504 is specifically configured to: an operation manual of the target application is generated based on the target translation content corresponding to the operation image and the target translation content corresponding to the chapter image.
The respective modules in the above-described operation manual generation device may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program, when executed by a processor, implements a method of generating an operating manual. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 6 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
The application provides a method, a device, a computer device, a storage medium and a computer program product for generating an operation manual, which relate to the technical field of artificial intelligence, can be used in the technical field of finance or other fields, and is not limited in application field.
It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (11)

1. A method of generating an operating manual, the method comprising:
monitoring triggering operation of a user in a display interface of a target application program, and generating an operation image corresponding to the triggering operation based on current interface information of the display interface;
classifying the operation images by adopting an image classification algorithm to obtain operation types corresponding to the operation images;
Performing translation processing on the operation image by adopting a translation strategy matched with the operation type to obtain target translation content corresponding to the operation image;
based on the target translation content, an operation manual for the target application is generated.
2. The method according to claim 1, wherein generating the operation image corresponding to the trigger operation based on the current interface information of the display interface includes:
determining trigger information corresponding to the trigger operation in the current interface information of the display interface; the trigger information comprises at least one of trigger object position information and trigger operation position information;
and adding an image mark in the current interface information of the display interface based on the trigger information to obtain an operation image corresponding to the trigger operation.
3. The method according to claim 1, wherein after generating the operation image corresponding to the trigger operation based on the current interface information of the display interface, further comprises:
generating image index information of an operation image corresponding to the triggering operation based on the triggering sequence of the triggering operation;
the generating an operation manual for the target application based on the target translation content includes:
When a plurality of operation images are provided, sorting target translation contents corresponding to the operation images according to image index information of the operation images, and generating an operation manual of the target application program based on the sorted target translation contents.
4. The method according to claim 1, wherein the translating the operation image using a translation policy matching the operation type to obtain the target translation content corresponding to the operation image includes:
extracting content information corresponding to the triggering operation from the operation image based on a content extraction strategy matched with the operation type; the content information comprises triggering object content information corresponding to the triggering operation;
and generating target translation content corresponding to the operation image based on the translation content template information matched with the operation type and the content information corresponding to the triggering operation.
5. The method of claim 4, wherein extracting content information corresponding to the trigger operation from the operation image based on a content extraction policy matching the operation type, comprises:
When the operation type is an input operation type, inputting the operation image into an image recognition model to obtain a target input frame name image and a target input frame image which correspond to the triggering operation in the operation image;
and respectively carrying out character recognition on the target input frame name image and the target input frame image to obtain an input frame name and input content information.
6. The method of claim 4, wherein extracting content information corresponding to the trigger operation from the operation image based on a content extraction policy matching the operation type, comprises:
identifying clicking text information corresponding to the triggering operation in the operation image under the condition that the operation type is a text clicking operation type;
and extracting a click icon image corresponding to the triggering operation in the operation image under the condition that the operation type is the icon click operation type.
7. The method according to claim 1, wherein the method further comprises:
displaying a chapter information input interface;
responding to the input completion operation of a user in the chapter information input interface, and acquiring a chapter image containing chapter information;
Identifying chapter information contained in the chapter image to obtain target translation content corresponding to the chapter image;
the generating an operation manual for the target application based on the target translation content includes:
and generating an operation manual of the target application program based on the target translation content corresponding to the operation image and the target translation content corresponding to the chapter image.
8. An operation manual generation device, characterized in that the device comprises:
the monitoring module is used for monitoring triggering operation of a user in a display interface of a target application program and generating an operation image corresponding to the triggering operation based on current interface information of the display interface;
the classifying module is used for classifying the operation images by adopting an image classifying algorithm to obtain operation types corresponding to the operation images;
the translation module is used for translating the operation image by adopting a translation strategy matched with the operation type to obtain target translation content corresponding to the operation image;
and the first generation module is used for generating an operation manual of the target application program based on the target translation content.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202310177428.2A 2023-02-28 2023-02-28 Method, device, computer device and storage medium for generating operation manual Pending CN116090415A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310177428.2A CN116090415A (en) 2023-02-28 2023-02-28 Method, device, computer device and storage medium for generating operation manual

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310177428.2A CN116090415A (en) 2023-02-28 2023-02-28 Method, device, computer device and storage medium for generating operation manual

Publications (1)

Publication Number Publication Date
CN116090415A true CN116090415A (en) 2023-05-09

Family

ID=86199228

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310177428.2A Pending CN116090415A (en) 2023-02-28 2023-02-28 Method, device, computer device and storage medium for generating operation manual

Country Status (1)

Country Link
CN (1) CN116090415A (en)

Similar Documents

Publication Publication Date Title
CN103838566A (en) Information processing device, and information processing method
US20150113388A1 (en) Method and apparatus for performing topic-relevance highlighting of electronic text
CN105631393A (en) Information recognition method and device
CN114049631A (en) Data labeling method and device, computer equipment and storage medium
CN115917613A (en) Semantic representation of text in a document
CN115758451A (en) Data labeling method, device, equipment and storage medium based on artificial intelligence
CN115438740A (en) Multi-source data convergence and fusion method and system
CN117332766A (en) Flow chart generation method, device, computer equipment and storage medium
CN111602129B (en) Smart search for notes and ink
CN112182451A (en) Webpage content abstract generation method, equipment, storage medium and device
CN104391844A (en) Data management system and tool
CN112329409A (en) Cell color conversion method and device and electronic equipment
Yang et al. A large-scale dataset for end-to-end table recognition in the wild
CN116090415A (en) Method, device, computer device and storage medium for generating operation manual
CN115690821A (en) Intelligent electronic file cataloging method and computer equipment
US10970533B2 (en) Methods and systems for finding elements in optical character recognition documents
US9471569B1 (en) Integrating information sources to create context-specific documents
CN104899572A (en) Content-detecting method and device, and terminal
CN112417252B (en) Crawler path determination method and device, storage medium and electronic equipment
CN113762303B (en) Image classification method, device, electronic equipment and storage medium
US20240135739A1 (en) Method of classifying a document for a straight-through processing
US11367442B2 (en) Device and method with input
CN117975474A (en) Picture classification method and computing device
CN117521051A (en) Verification problem processing method, device, computer equipment and storage medium
Khade et al. An Interactive Floor Plan Image Retrieval Framework Based on Structural Features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination