CN115115740A - Thinking guide graph recognition method, device, equipment, medium and program product - Google Patents

Thinking guide graph recognition method, device, equipment, medium and program product Download PDF

Info

Publication number
CN115115740A
CN115115740A CN202210421961.4A CN202210421961A CN115115740A CN 115115740 A CN115115740 A CN 115115740A CN 202210421961 A CN202210421961 A CN 202210421961A CN 115115740 A CN115115740 A CN 115115740A
Authority
CN
China
Prior art keywords
image
target
root node
mind map
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210421961.4A
Other languages
Chinese (zh)
Inventor
谷枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210421961.4A priority Critical patent/CN115115740A/en
Publication of CN115115740A publication Critical patent/CN115115740A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/206Drawing of charts or graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a thinking guide picture identification method, a thinking guide picture identification device, thinking guide picture identification equipment, thinking guide picture identification media and a program product, wherein the method comprises the following steps: acquiring a target image containing a target mind map; displaying a predicted root node in the target mind map in the target image in response to a restore operation for the target mind map; and responding to the confirmation operation of the prediction root node, and outputting the target mind map, wherein the target mind map is in an editable state. By adopting the method and the device, the root node in the thought-guide graph can be automatically identified, and the intelligence and the accuracy of the thought-guide graph restoration are improved.

Description

Thinking guide graph recognition method, device, equipment, medium and program product
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, a medium, and a program product for identifying a mind map.
Background
The thinking guide graph is a graphic tool which can effectively express the divergent thinking; by using the thinking graph, a great deal of information can be analyzed and summarized efficiently, and the user object is helped to clarify the relation between the information. In most cases, the mind map exists in an image form, which is not beneficial to editing the mind map; therefore, how to restore the mind map becomes a topic of research.
The practice shows that the existing thinking map reduction mode comprises the following steps: 1. manually marking a root node of the thought guide graph in the image by the user object, and then restoring the thought guide graph based on the manually marked root node; the workload of the user object is increased, so that the whole mind map restoration process needs manual intervention, and the user object experience is poor due to insufficient automation and intellectualization. 2. Restoring the mind map by using an a priori rule, such as designating a node at an absolute coordinate (such as (0, 0)) as a root node to realize the restoration of the mind map; this approach is wrong in most cases for the root node selected, and can only be used for certain scenarios (e.g., user objects are required to place the root node at a specified location, etc.), resulting in poor generalization capability.
Disclosure of Invention
The embodiment of the application provides a thinking guide graph identification method, a device, equipment, a medium and a program product, which can automatically identify a root node in the thinking guide graph and improve the intelligence and accuracy of the thinking guide graph restoration.
In one aspect, an embodiment of the present application provides a method for recognizing a mind map, where the method includes:
acquiring a target image containing a target mind map;
displaying a predicted root node in the target mind map in the target image in response to a restore operation for the target mind map;
and responding to the confirmation operation of the prediction root node, and outputting the target mind map, wherein the target mind map is in an editable state.
In another aspect, an embodiment of the present application provides a mind map recognition apparatus, including:
an acquisition unit configured to acquire a target image including a target mind map;
a processing unit for displaying a predicted root node in the target mind map in the target image in response to a restore operation for the target mind map;
and the processing unit is also used for responding to the confirmation operation of the prediction root node and outputting a target mind map, and the target mind map is in an editable state.
In one implementation, the processing unit, when displaying the predicted root node in the target mind map in the target image, is specifically configured to:
marking and displaying a prediction root node in a target thinking diagram in a target image;
the label display comprises the following steps: displaying the prediction root node in a visual display mode; or displaying the prediction root node in the form of a label identifier;
wherein, the label mark is displayed in the area where the prediction root node is positioned; or the label mark is displayed in the target image in an annotation form.
In one implementation, the processing unit, when being configured to acquire a target image including a target mind map, is specifically configured to:
displaying a function selection interface, wherein the function selection interface comprises an image-to-thought guide diagram option;
responding to the triggering of the image-to-thought guide picture option, and displaying an image acquisition interface;
and acquiring a target image containing the target mind map in the image acquisition interface.
In one implementation, the image acquisition interface includes an image uploading option, and the processing unit is configured to, when acquiring a target image including a target mind map in the image acquisition interface, specifically:
in response to triggering the image upload option, displaying at least one candidate image;
any candidate image is selected from at least one candidate image, the selected candidate image is used as a target image, and the mind map contained in the candidate image is used as a target mind map contained in the target image.
In one implementation manner, the image acquisition interface includes an image scanning option, and the processing unit is configured to, when acquiring a target image including a target mind map in the image acquisition interface, specifically:
and responding to the triggering of the image scanning option, and performing scanning operation on a target image containing the target mind map to acquire the target image.
In one implementation, the processing unit is further configured to:
in response to the operation of reselecting the root node in the target image, the marking display of the predicted root node is cancelled in the target image, and the newly reselected root node is marked and displayed in the target image;
outputting a target mind map in response to a validation operation on the new root node;
wherein the annotation display of the prediction root node is revoked for indicating: the prediction root node is not validated.
In one implementation, the processing unit is further configured to:
updating the target mind map in response to an editing operation on the target mind map;
and responding to the sharing operation of the updated target mind map, and sharing the shared image containing the updated target mind map.
In one implementation, the processing unit, in response to a restore operation on the target mind map, is specifically configured to, when displaying a predicted root node in the target mind map in the target image:
responding to the reduction operation aiming at the target thinking guide graph, calling a target semantic segmentation model to carry out root node prediction processing on the target image to obtain an initial prediction root node in the target thinking guide graph;
determining a node distance between the initial prediction root node and each node according to the initial prediction root node and each node in the target thinking map;
and taking the node with the shortest node distance with the initial prediction root node as the prediction root node.
In one implementation, the processing unit is configured to call the target semantic segmentation model to perform root node prediction processing on the target image, and when an initial prediction root node in the target thinking graph is obtained, the processing unit is specifically configured to:
calling a target semantic segmentation model to perform semantic segmentation processing on a target image to obtain a target semantic label image corresponding to the target image, wherein the target semantic label image comprises a foreground region;
taking a target connected domain of the foreground region as a salient region in a target image;
the central point of the salient region is used as an initial prediction root node in the target mind map.
In one implementation, the processing unit is further configured to:
acquiring a training image, wherein the training image comprises a training thinking guide picture;
performing first semantic segmentation processing on a training thinking guide image in a training image to obtain a first semantic label image; and the number of the first and second groups,
calling the initial semantic segmentation model to perform second semantic segmentation processing on the training thinking guide graph in the training image to obtain a second semantic label image;
obtaining a loss function of the initial semantic segmentation model, and calculating a loss value of the loss function according to the first semantic label image and the second semantic label image;
if the loss value meets the preset condition, the initial semantic segmentation model reaches a convergence condition, and the initial semantic segmentation model is used as a trained target semantic segmentation model;
and if the loss value does not meet the preset condition, performing iterative training on the initial semantic segmentation model.
In one implementation, the processing unit is configured to perform a first semantic segmentation process on a training mind map in a training image to obtain a first semantic label image, and specifically configured to:
performing region extraction processing on the training image to obtain an attention image, wherein the attention image comprises a salient region;
extracting features of the attention map image to obtain a feature map;
and redefining the class labels in the feature map according to the attention map to obtain a first semantic label image.
In an implementation manner, the processing unit is configured to perform region extraction processing on the training image, and when obtaining the attention image, the processing unit is specifically configured to:
acquiring a root node marking file corresponding to the training image, wherein the root node marking file comprises coordinate information of a root node in a preset training thinking guide graph;
performing region extraction processing on the training image based on the coordinate information to generate an attention image, wherein the attention image comprises: a salient region centered on the coordinate position indicated by the coordinate information.
In one implementation, the salient region refers to: a circular area centered on the coordinate position indicated by the coordinate information; the processing unit is configured to perform region extraction processing on the training image based on the coordinate information, and when obtaining the attention image, the processing unit is specifically configured to:
acquiring display size information of a training image, and calculating a radiation radius according to the display size information;
determining a salient region in the training image according to the coordinate information and the radiation radius;
calculating the thermal value of each feature point in the salient region to generate an attention image; the thermal value of a feature point within a salient region in the attention image is greater than a thermal value threshold.
In an implementation manner, the processing unit is configured to perform redefinition processing on the category label in the feature map according to the attention map image to obtain a first semantic label image, and is specifically configured to:
and replacing the label value of the category label of the corresponding feature point in the feature map by adopting the thermal value of each feature point in the salient region in the attention image to obtain a first semantic label image.
In one implementation, the processing unit is configured to call the initial semantic segmentation model to perform a second semantic segmentation on a training mind map in a training image, and when obtaining a second semantic label image, the processing unit is specifically configured to:
calling a down-sampling module in the initial semantic segmentation model to perform down-sampling processing on the training image to obtain a first resolution image;
calling N-order convolutional layers in the initial semantic segmentation model, and performing feature extraction on the first resolution image to obtain N feature maps with different resolutions; n is an integer greater than 1;
calling a feature fusion module in the initial semantic segmentation model, and performing feature fusion processing on the N feature graphs to obtain a target feature graph;
and calling a classification module in the initial semantic segmentation model to classify the target feature map to obtain a second semantic label image.
In one implementation, the processing unit is configured to invoke a feature fusion module in the initial semantic segmentation model, perform feature fusion processing on the N feature maps, and when obtaining the target feature map, specifically configured to:
carrying out up-sampling processing on the feature map with the resolution smaller than the resolution threshold value in the N feature maps to obtain N feature maps with the same resolution;
and carrying out fusion processing on the N feature maps with the same resolution to obtain a target feature map.
In another aspect, the present application provides a computer device comprising:
a processor for loading and executing a computer program;
a computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, implements the mind map identifying method described above.
In another aspect, the present application provides a computer readable storage medium storing a computer program adapted to be loaded by a processor and to perform a mind map recognition method.
In another aspect, the present application provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the mind map recognition method described above.
In the embodiment of the application, the computer device responds to the restoration operation aiming at the target mind map, can perform prediction processing on the root node of the target mind map in the target image, and displays the predicted root node of the target mind map obtained through prediction in the target image; therefore, automatic prediction of the root node in the target thought-guide graph can be achieved, compared with the method of manually marking the root node, the method avoids the confirmation process of manually interfering the root node in the thought-guide graph, and reduces the workload of the user object. Further, the computer device can continue to restore the complete target thinking guide graph based on the predicted root node in response to the confirmation operation of the user object on the predicted root node; the method can be used for accurately predicting the target thinking guide graph based on the prediction, and improving the accuracy of the target thinking guide graph obtained by reduction. In the scheme, in the restoring process of the target mind map, the complete target mind map can be quickly and accurately restored without manual intervention, and the restoring efficiency of the target mind map is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1a illustrates a schematic diagram of a mind map provided by an exemplary embodiment of the present application;
FIG. 1b illustrates a schematic diagram of a mind map provided by an exemplary embodiment of the present application;
FIG. 2 illustrates an architectural diagram of a mind map recognition system provided by an exemplary embodiment of the present application;
FIG. 3 illustrates a flow diagram of a mind map recognition method provided by an exemplary embodiment of the present application;
FIG. 4 is a schematic diagram illustrating a triggered display image capture interface provided by an exemplary embodiment of the present application;
FIG. 5a is a schematic diagram illustrating a method for obtaining a target image from a storage space through an image upload option according to an exemplary embodiment of the present application;
FIG. 5b is a schematic diagram illustrating a method for capturing an image of a target by shooting according to an exemplary embodiment of the present application;
FIG. 6a illustrates a schematic diagram of a reduction operation provided by an exemplary embodiment of the present application;
FIG. 6b is a schematic illustration of a reduction operation provided by an exemplary embodiment of the present application;
FIG. 6c illustrates a schematic diagram of a reduction operation provided by an exemplary embodiment of the present application;
FIG. 7a is a diagram illustrating a display of a prediction root node labeled in a target image according to an exemplary embodiment of the present application;
FIG. 7b is a diagram illustrating a method for displaying a prediction root node in a target image according to an exemplary embodiment of the present application;
FIG. 7c is a diagram illustrating a display of a prediction root node labeled in a target image according to an exemplary embodiment of the present application;
FIG. 8 is a diagram illustrating an output of a target mind map by triggering a confirmation operation in an image preview interface provided by an exemplary embodiment of the present application;
FIG. 9 illustrates a flow chart diagram of a mind map identification method provided by an exemplary embodiment of the present application;
FIG. 10 is a diagram illustrating a method for determining a predicted root node in a target mind map according to an exemplary embodiment of the present application;
FIG. 11 is a diagram illustrating an initial prediction of Euclidean distances between a root node and various nodes provided by an exemplary embodiment of the present application;
FIG. 12 illustrates a schematic diagram of a reselection of a new root node provided by an exemplary embodiment of the present application;
FIG. 13a is a schematic diagram illustrating a zoom target image provided by an exemplary embodiment of the present application;
FIG. 13b is a schematic diagram illustrating a zoom target image provided by an exemplary embodiment of the present application;
FIG. 13c is a schematic diagram illustrating a zoom target image provided by an exemplary embodiment of the present application;
FIG. 13d is a schematic diagram illustrating a sliding target image according to an exemplary embodiment of the present application;
FIG. 14 is a diagram illustrating an edited target mind map provided by an exemplary embodiment of the present application;
FIG. 15 is a diagram illustrating a sharing of edited target mind maps provided by an exemplary embodiment of the present application;
FIG. 16 illustrates a flow chart diagram of a mind map identification method provided by an exemplary embodiment of the present application;
FIG. 17 illustrates a schematic diagram of determining a first semantic label image corresponding to a training image provided by an exemplary embodiment of the present application;
FIG. 18a illustrates a semantic segmentation provided by an exemplary embodiment of the present application;
FIG. 18b illustrates a schematic diagram of semantic segmentation provided by an exemplary embodiment of the present application;
FIG. 19 is a diagram illustrating training an initial semantic segmentation model provided by an exemplary embodiment of the present application;
FIG. 20 is a schematic diagram illustrating a feature fusion process provided by an exemplary embodiment of the present application;
FIG. 21 is a schematic diagram illustrating an example of a mind map identifying apparatus according to an embodiment of the present application;
fig. 22 shows a schematic structural diagram of a computer device according to an exemplary embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the present application provides a mind map recognition scheme, and some technical terms and concepts related to the mind map recognition scheme are briefly described below, wherein:
first, Artificial Intelligence (AI).
Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The application mainly relates to Machine Learning (ML) in artificial intelligence, wherein the Machine Learning is a multi-field cross subject and relates to a plurality of subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like, and the method is used for specially researching how a computer simulates or realizes the Learning behavior of human beings so as to obtain new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning. Machine learning can be viewed as a task whose goal is to let machines (computers in a broad sense) learn to obtain human-like intelligence. For example, a human being will play go, and a computer program (AlphaGo or AlphaGo Zero) is designed to master the knowledge of go, the program playing go. Various methods can be used for realizing the task of machine learning, such as neural networks, linear regression, decision trees, support vector machines, Bayesian classifiers, reinforcement learning, probability map models, clustering and other various methods.
Second, thinking Map (The Mind Map).
The mind map may be called a mind map, which is a graphical tool that can effectively express divergent thoughts. Through divergent expression of complex and rich information by using a mind map tool, a mind map for representing rich information content can be obtained; the user object can analyze and summarize the information through the thinking guide graph, and is beneficial to quickly clearing the relation among all information, so that the complex information becomes easy to remember and understand. Wherein, the mind map tool may include: the special mind map application program is used for supporting the building or editing of the mind map; or a mind map service provided by an application program for building or editing a mind map (such as the mind map service provided by a document type application program); and the like. The above-mentioned application program may refer to a computer program for performing a certain job or jobs. Classified according to the operation mode of the application program, the application program can include but is not limited to: a client installed in a terminal, an applet that can be used without download installation, a web application opened through a browser, and the like. ② classified by the functional type of the application, applications may include, but are not limited to: an IM (Instant Messaging) application, a content interaction application or document application, or the like; the instant messaging application refers to an application based on internet for instant messaging and social interaction, and the instant messaging application may include, but is not limited to: social applications that include communication functionality, mapping applications that include social interaction functionality, gaming applications, and the like. The content interaction application is an application that can achieve content interaction, and may be, for example, an application such as internet banking, a sharing platform, a personal space, and news. A document application refers to an application with document editing capabilities, such as an application for online documents, collaborative documents, and so on. The embodiment of the present application is not limited to the specific type of application program, which supports building or editing the mind map, and is described herein.
Further, the mind map is mainly composed of: root node, child node and thinking guide graph connecting line. Specifically, the root node is a starting node (or called a central node) of the thought map, and a thought center serving as the thought map is often used for expressing a core theme (or called a central theme); then, starting from the thought center, thousands of sub-nodes may be diverged outward, each sub-node representing a link with the center topic, and each sub-node may become the next center topic, and thousands of sub-nodes may be diverged outward, presenting a radioactive three-dimensional structure. It should be understood that the present application is not limited to the names of the nodes and the links in the mind map, and the root node may also be referred to as a core node, the child nodes may also be referred to as joint nodes (or leaf nodes), etc., which are described herein. An exemplary mind map is schematically illustrated in fig. 1a, and as shown in fig. 1a, the mind map starts from a root node 101 and diverges into 4 child nodes to the right, which are: child node 1, child node 2, child node 3, and child node 4. Each child node can become a next root node and continuously diverges the child nodes outwards; for example, child node 1 acts as a root node and diverges to the right to yield child node 1.1 and child node 1.2, and for example, child node 2 acts as a root node and diverges to the right to yield child node 2.1, child node 2.2, and child node 2.3, and so on. By analogy, the complex information can be hierarchically expressed through the process to obtain the thought guidance diagram.
It should be noted that FIG. 1a is a schematic diagram illustrating an exemplary mind map; in the mind map shown in fig. 1a, each node in the mind map is represented in the form of a text block (or called an image block). It is understood that the nodes in the thought graph may also be represented in other forms, such as in the form of end-points; an exemplary thinking-guiding graph with nodes as line ends is shown in FIG. 1b, in which as shown in FIG. 1b, the root node is a line end, and the core theme represented by the root node can be displayed on the thinking-guiding graph line between two line ends. The embodiment of the present application does not limit the specific style and configuration of the mind map.
In practical application, the constructed thinking map usually exists in the form of images (or pictures); when the constructed thinking map is shared, an image is generated based on the thinking map, and then the image is shared, so that the thinking map contained in the image is shared. However, the image itself has a characteristic of not supporting editing of the content included in the image, which makes the user object unable to perform an editing operation (such as adding or deleting nodes) on the mind map included in the image. Based on the above, when the user object has the requirement of editing the mind map contained in the image, the mind map contained in the image can be restored based on the restoration technology so as to restore the mind map which allows the editing operation to be executed; in other words, restoring the mind map in the target image means: converting the mind map in the non-editable state into the mind map in the editable state; the method is convenient for the user object to execute editing operation on the mind map in an editable state so as to update the mind map and meet the editing requirement of the user object on the mind map. As described above, the thought-map is a divergent thinking structure that diverges from the root node to the periphery; therefore, the main idea of the restoration technology for restoring the mind map included in the image is as follows: the method comprises the steps of firstly identifying a root node of a thought-guiding graph, then searching other child nodes related to the root node by taking the root node as a starting node (or simply called as a starting point) according to a depth-first traversal mode, carrying out layer-by-layer promotion, and finally restoring the complete structure of the thought-guiding graph so as to present the thought-guiding graph which can be edited to a user object.
Based on the description, in the process of restoring the thought-guide graph, the root node of the thought-guide graph is determined quickly and accurately, and the method has important significance for restoring the complete thought-guide graph; therefore, the embodiment of the present application mainly focuses on an implementation process for identifying or predicting a root node in a mind map. Specifically, considering that the neural network under the artificial intelligence has a good generalization capability, the embodiment of the application supports the adoption of a neural network model (or simply referred to as a neural network) to quickly and accurately identify the root node in the mind map, so that the reduction efficiency and accuracy of the whole mind map are improved. The neural network model can comprise a semantic segmentation model (or called as an image semantic segmentation model), and the semantic segmentation model can be used for classifying each pixel in the image so as to distinguish a significant region from other regions in the image; the salient region in the image herein may refer to: the region of interest in the image by human vision (or referred to as a region of interest), other regions may refer to: regions in the image other than the salient regions. Of course, the embodiment of the present application does not limit what kind of network model the neural network model is specifically, for example, other network structures with the same function may be used instead of the semantic segmentation model.
For human, when facing the image, the brain automatically starts a visual attention (attention) mechanism to automatically process the salient region in the image; the visual attention mechanism may be simply referred to as an attention mechanism, and is a mechanism in which human vision selectively focuses on part of all information (e.g., all information contained in an image) while ignoring other visible information. However, for the neural network trained by the machine, the neural network learns the feature information (or simply referred to as features) of the whole image, the feature information does not appear to be different from the neural network, and the feature information is processed equivalently for all the features of the image and does not pay excessive attention to a certain region. Based on this, the thinking guide picture recognition scheme provided by the invention aims to simulate the visual characteristics of human beings, and rigidly selects the parts needing attention in the input image, or assigns different weights to different parts in the input image, so that the attention of the neural network is focused on the areas needing attention (for example, the root nodes in the thinking guide picture are the parts needing attention), and the attention of other information is reduced, thereby enabling the neural network to have the capability of extracting the salient areas (for example, the areas where the root nodes in the thinking guide picture are located) in the image.
In a specific implementation, the general principle of the mind map recognition scheme provided by the embodiment of the present application may include: firstly, training an initial semantic segmentation model (such as a semantic segmentation model to be trained) to obtain a target semantic segmentation model (such as a trained semantic segmentation model), wherein the target semantic segmentation model has better generalization capability and supports the reduction of mind maps with different styles and structures, thereby extending the use scene of the mind map identification scheme provided by the embodiment of the application. Secondly, responding to the reduction requirement of a user object on a target thinking guide graph in a target image, performing semantic segmentation processing on the target image by adopting a target semantic segmentation model, and identifying a prediction root node in the target thinking guide graph; therefore, the root node in the target thinking guide graph can be predicted quickly and accurately without manual intervention, and the reduction efficiency of the target thinking guide graph is improved. And secondly, displaying the identified prediction root node in the target image so as to facilitate the user object to confirm the prediction root node obtained by adopting the target semantic segmentation model for prediction. And finally, responding to the confirmation operation of the user object on the prediction root node, continuously restoring the whole target thinking guide graph according to the prediction root node, and outputting the whole target thinking guide graph so that the user object can execute editing operation on the restored target thinking guide graph.
In order to better understand the mind map recognition scheme provided by the embodiment of the present application, a brief description is provided below in conjunction with the mind map recognition system shown in fig. 2 for a mind map recognition scenario related to the embodiment of the present application; as shown in fig. 2, the mind map recognition system includes a computer device 201 and a computer device 202, and the number and nomenclature of the computer devices 201 and 202 are not limited in the embodiments of the present application. Among other things, computer device 201 may be a terminal that deploys a target application (e.g., a mind-map tool) with the ability to restore a mind-map, which may include, but is not limited to: smart phones (such as Android phones and iOS phones), tablet computers, portable personal computers, smart voice interaction Devices, Mobile Internet Devices (MID), smart appliances, vehicle terminals, aircrafts, head-mounted Devices, and other smart Devices capable of performing touch screens. Computer device 202 may refer to a server corresponding to computer device 201, and is configured to provide computing and application support for computer device 201; the server may include, but is not limited to: data processing servers, Web servers, application servers, and the like have complex computing capabilities. The server may be an independent physical server, or may be a server cluster or distributed system composed of a plurality of physical servers. The computer device 201 and the computer device 202 may be directly or indirectly communicatively connected by a wired or wireless manner, and the connection manner between the computer device 201 and the computer device 202 is not limited in the embodiments of the present application.
The mind map recognition scheme provided by the embodiment of the present application may be executed by the aforementioned computer device 201 or computer device 202, or executed by both the computer device 201 and the computer device 202; for convenience of illustration, the description will be made by taking the computer device 101 and the computer device 102 as an example to jointly execute the mind map recognition scheme provided by the embodiments of the present application. In a specific implementation, the computer device 202 may obtain a training data set for training the initial semantic segmentation model to obtain a trained target semantic segmentation model. The trained target semantic segmentation model can be deployed in the computer device 201, so that when the computer device 201 responds to a restoration operation for a target image containing a target mind map, the target semantic segmentation model is called to perform prediction processing on the target image, and prediction root nodes in the target mind map are predicted quickly and accurately. Furthermore, in response to the confirmation operation of the user object on the predicted root node obtained by the prediction of the target semantic segmentation model, the target mind map can be continuously traversed according to the predicted root node so as to restore and obtain a complete target mind map, and the target mind map allows the editing operation to be executed. By the scheme, the root nodes in the target mind map can be automatically predicted, and then the complete target mind map can be automatically and accurately restored without manual intervention, so that the workload of the user object is reduced, the restoring efficiency of the target mind map is improved, and the use experience of the user object is improved.
It should be noted that, the trained target semantic segmentation model by the computer device 202 may be deployed in the computer device 202 in addition to the computer device 201; in this implementation, the computer device 202 may receive the target image containing the target mind map sent by the computer device 201, and then call the trained target semantic segmentation model to restore the target mind map; and returning the restored complete target mind map to the computer device 201, so that the computer device 201 outputs the target mind map, and the user object can edit the restored target mind map.
Based on the thought guide graph recognition scheme described above, the thought guide graph recognition scheme provided in the embodiment of the present application mainly relates to two aspects, namely, on one hand, model training is performed to obtain a target semantic segmentation model, and on the other hand, a trained target semantic segmentation model is adopted for model application; a more detailed method for recognizing the mind map proposed by the embodiment of the present application will be described below with reference to the accompanying drawings.
FIG. 3 illustrates a flow diagram of a mind map recognition method provided by an exemplary embodiment of the present application; the mind map recognition method shown in fig. 3 mainly relates to the content of the model application part, and may be executed by a computer device (such as the computer device 201), and may include, but is not limited to, steps S301 to S303:
s301: a target image containing a target mind map is acquired.
When the user object has a need to restore the target mind map contained in the target image, the user object may acquire the target image containing the target mind map to be restored using a computer device (e.g., the computer device 201). Taking a target application program (such as a document application program) with a restored mind map deployed in computer equipment as an example, in a specific implementation, a user object can open and use the target application program to display a function selection interface, and the function selection interface comprises an image-to-mind map conversion option; when the user object triggers the image-to-thought-guide option, the user object indicates that the user object wants to restore the thought-guide, and the computer equipment responds to the triggering of the user object on the image-to-thought-guide option and displays an image acquisition interface; in this way, the user object can input the target image containing the target mind map to be restored in the image acquisition interface, that is, the computer device can acquire the target image containing the target mind map in the image acquisition interface.
For ease of understanding, a schematic diagram of a triggered display image acquisition interface is given below in conjunction with fig. 4; as shown in fig. 4, in response to the operation of restoring the mind map, the computer device may display a function selection interface 401 provided by the target application program, where the function selection interface 401 includes one or more options, and the one or more options include an image-to-mind map option 402; when any option is selected (such as clicking), the function corresponding to the selected option can be triggered to be executed. In response to triggering the image-to-mind map option 402, an image acquisition interface 403 may be displayed within which an image of a target containing a target mind map may be acquired 403. It should be noted that, according to the difference of the target application program with the restored mind map, the content and style of the function selection interface and/or the image obtaining interface provided by the target application program may not be as good as those shown in fig. 4, and fig. 4 merely shows an exemplary implementation process for triggering the image obtaining interface to be displayed from the function selection interface.
The method and the device support the adoption of various implementation modes, and the target image containing the target thinking map is acquired from the image acquisition interface; in other words, more than one path may be taken in the image acquisition interface to acquire the target image. For example: the image acquisition interface comprises an image uploading option (or a key, a button, a control and the like), and the target image in the storage space is uploaded to the image acquisition interface through the image uploading option so as to acquire the target image; the following steps are repeated: the image acquisition interface comprises an image scanning option, and the target image is acquired in a scanning mode through the image scanning option. The two implementation manners for acquiring the target image given above are explained in more detail below; wherein:
in one implementation, the target image is acquired via an image upload option in the image acquisition interface. In a specific implementation, the image acquisition interface includes an image uploading option, and when the user object performs a trigger operation on the image uploading option, the user object indicates that the user object wants to restore the target image in the storage space, and the computer device may display at least one candidate image included in the storage space in response to the trigger on the image uploading option. Then, the user object may select any one of the at least one candidate image, and the computer device may set any one of the candidate images selected by the user object as a target image and set a mind map included in the any one of the candidate images as a target mind map included in the target image. Wherein, the aforementioned storage space may include: a local storage space of a computer device, or a cloud storage space of a user object, etc.; the embodiment of the present application does not limit the specific type of the storage space. By the method for acquiring the target image from the storage space, the target image to be restored can be determined by the user object from the storage space quickly, and the restoration experience of the user object on the target mind map is improved.
An exemplary schematic diagram of retrieving a target image from a storage space through an image upload option may be seen in fig. 5a, as shown in fig. 5a, an image upload option 5011 is included in an image retrieval interface 501, and in response to a trigger to the image upload option 502, at least one candidate image, such as candidate image 502, candidate image 503, candidate image 504, … …; in response to selection of any one of the at least one candidate image (e.g., candidate image 502), determining that the selected candidate image 502 is to be the target image, the selected target image is displayed in image acquisition interface 501, such that the user object reconfirms whether the displayed candidate image 502 is to be the target image in image acquisition interface 501. If the user object is not satisfied with the candidate image 502 displayed in the image acquisition interface, the user object may perform the operation of selecting the target image from the candidate images again, satisfying the requirement that the user object selects the target image multiple times.
In other implementations, the target image is obtained via an image scanning option in the image acquisition interface. In a specific implementation, the image obtaining interface includes an image scanning option, and when the user object performs a trigger operation on the image scanning option, the user object indicates that the user object wants to obtain the target image in a scanning manner, and the computer device may perform a scanning operation on the target image including the target mind map in response to the trigger operation on the image scanning option, so as to obtain the target image. After the image scanned by the computer equipment is determined to obtain the target image containing the target mind map, the target image can be displayed in the image obtaining interface so as to prompt the user to confirm the scanned target image again. If the target image obtained by scanning is not satisfactory by the user object, if the scanned target image is not clear or incomplete, the user object can execute scanning operation again until the obtained target image meets the requirement of the user object.
It should be noted that the target image to be scanned described above can be displayed in the computer device, and can also be displayed in other devices besides the computer device; the scanning operation performed on the target image differs depending on the apparatus on which the target image is displayed. Optionally, the target image to be scanned is displayed in a terminal screen of the computer device; in this implementation, the scanning operation on the target image may include: the computer equipment identifies the operation of the target image from the terminal screen; by the mode, the scanning of the image contained in the equipment can be realized, and the modes for acquiring the target image are enriched. Optionally, the target image to be scanned is displayed on a terminal screen of a device other than the computer device; in this implementation, the scanning operation on the target image may include: and the computer equipment responds to the triggering of the image scanning option to open the camera and performs scanning operation on the images in the terminal screens of other equipment through the camera. Under the condition that the target image to be scanned does not belong to the computer equipment, the target image can be obtained through the two equipment together in the mode, the obtaining mode of the target image is enriched, and the multi-scene requirement of a user object for obtaining the target image is met.
Further, in the scanning of the target image including the target mind map using the computer device, determining that the scanning of the computer device is completed may include: the duration of scanning the target image is equal to the duration threshold; and if the time threshold is 3 seconds and the detected scanning time is 3 seconds, determining that the scanning is finished, and displaying the scanned image as a target image in an image acquisition interface. Or determining that the scanned image meets the image scanning requirement; such as whether the definition of the image meets the definition requirement, such as whether the content contained in the image is complete, and the like. Or detecting whether the shooting key is triggered or not, and if the shooting key is triggered, determining that the shot image is displayed in the image acquisition interface as a target image.
An exemplary schematic diagram of capturing a target image by shooting is shown in fig. 5b, as shown in fig. 5b, the image capturing interface 501 includes an image uploading option 5012; in response to the triggering of the image uploading option 5012, a camera of a computer device (such as a smart phone) may be turned on, and the camera may be directed to a display screen (or called terminal screen) of another device (such as a smart computer) to capture an image displayed in the display screen; when the shooting key 505 (or an entity key of the computer device) is triggered, the shot image is determined as a target image, and the target image is displayed in the image acquisition interface.
It should be noted that the above are only some exemplary implementations of acquiring the target image given in the embodiments of the present application, and the embodiments of the present application do not limit which specific manner is used to acquire the target image.
S302: in response to a restore operation for the target mind map, a predicted root node in the target mind map is displayed in the target image.
Based on the implementation process shown in step S301, after the target image including the target mind map is acquired in the image acquisition interface, if the user object determines to perform the restoring process on the target mind map included in the target image displayed in the image acquisition interface, the user object may perform the restoring operation in the image acquisition interface. The restoring operation performed by the user object in the image acquisition interface may include, but is not limited to: triggering operation of the completion option; or, executing gesture operation of preset gesture (such as drawing S shape, L shape and double click); or inputting a voice input operation for instructing the restoration processing; and so on. Several exemplary reduction operation implementations are given below, in which:
optionally, the restore operation includes a trigger operation on a completion option. As shown in FIG. 6a, a done option 601 is included in the image capture interface; in response to the triggering operation on the completion option 601, indicating that the user object confirms that the target mind map currently displayed in the image acquisition interface is subjected to the reduction processing, determining that the reduction operation for the target mind map exists, and starting to perform the prediction processing on the root node in the target mind map.
Optionally, the restoring operation includes a gesture operation of performing a preset gesture (e.g., drawing an "S" shape). As shown in fig. 6b, when the computer device detects that the movement locus formed by the movement operation existing in the terminal screen is "S" shaped, it is determined that there is a restore operation for the target mind map, and then it starts to perform the prediction process on the root node in the target mind map.
Optionally, the restoring operation includes inputting a voice input operation for instructing the restoring process. As shown in fig. 6c, in the whole process of displaying the image acquisition interface in the terminal screen of the computer device, the recording function (e.g., the microphone is turned on) of the computer device is turned on; then, in the process of displaying the image acquisition interface on the terminal screen, the computer device may collect the sound signals in the surrounding environment in real time, identify the collected sound signals, and when it is identified that the content indicated by the sound signals is the restored target mind map, determine that the restoring operation for the target mind map exists, and then start to perform the prediction processing on the root node in the target mind map.
Further, the computer device can perform prediction processing on a root node in the target mind map in response to the restoration operation for the target mind map, and switch from the image acquisition interface to the image preview interface after predicting the predicted root node in the target mind map; and displaying a target image in the image preview interface, and displaying the predicted root node of the positioned target thinking graph in the target image. By displaying the prediction root node in the target image, the position of the prediction root node of the user object in the target image can be intuitively prompted, and the user object can see the position of the prediction root node in the target image at a glance. Of course, the image preview interface and the image obtaining interface may also be the same interface, which is not limited in this embodiment of the present application.
In a specific implementation, the prediction root node in the target mind map can be marked and displayed in the target image. The so-called annotation display prediction root node may include, but is not limited to: displaying the prediction root node in a target image in a visual display mode; or displaying the prediction root node in the target image in the form of an annotation identifier. The implementation of the exemplary label display prediction root node given above is explained in detail below with reference to the accompanying drawings, in which:
displaying a predicted root node in a target thinking picture in a visual display mode in a target image. The visual display may be referred to as visual highlighting or visual highlighting, and is intended to intuitively prompt the user with the display position of the predicted root node predicted by the object by highlighting or highlighting the predicted root node. The predicted root node in the visual display target mind map may include: the color of the displayed predicted root node is brighter, or more vivid, than the color of other nodes in the target mind map. Assuming that the colors of the respective nodes in the target mind map are white, as shown in fig. 7a, after the prediction root node in the target mind map is predicted, the display color of the prediction root node may be changed to gray or black to visually distinguish the prediction root node from other nodes, so that the user object can visually see the position of the predicted prediction root node. Alternatively, the displayed display size of the predicted root node is larger or smaller than the display size of other nodes in the target mind map. Alternatively, the predicted root node is dynamically displayed while other nodes in the target mind map are statically displayed to distinguish between the nodes in the target mind map. For example, the predicted root node is vibrated based on a frequency to enable dynamic display of the predicted root node. The embodiment of the present application does not limit the specific display manner of the visual display prediction root node, and is described herein.
Displaying a prediction root node in the target image in a form of marking identification in the target image; the label identification is used for representing a prediction root node in the target thinking graph; in other words, the position of the predicted prediction root node in the target image is indicated by the annotation indicator displayed in the target image. And the display style of the label mark is different according to the different display positions of the label mark in the target image. For example, the callout identifier can be displayed as a highlighted red (or other color) dot within the area where the predicted root node is located (e.g., highlighted small red dot 701 shown in FIG. 7 b); for another example, the annotation tag may be displayed in the target image in an annotation form, as shown in fig. 7c, the annotation tag may include: an annotation box extending from the root node, in which the node at the end of the extension line can be marked as the predicted root node.
It can be understood that, the embodiment of the present application does not limit the specific display manner for displaying the prediction root node in the target image; the foregoing is only illustrative of the embodiments of the present application and is described herein.
S303: in response to a validation operation on the prediction root node, a target mind map is output.
As described above, after the computer device adopts the algorithm to locate the predicted root node from the target mind map, the computer device will display the predicted root node in the target image; by displaying the prediction root node, the user object can conveniently and intuitively see the position of the positioned prediction root node, and the confirmation authority of the user object to the positioned prediction root node can be provided. In other words, after the prediction root node of the target mind map is located by adopting the algorithm, the embodiment of the application further supports the user object to confirm the predicted prediction root node, and only when the user object confirms that the predicted prediction root node is the correct root node in the target mind map, the subsequent step of restoring the whole target mind map is continuously executed, so that the restored target mind map is obtained based on the correct prediction root node restoration processing, the accuracy of the restored target mind map is improved, and the efficient and high-quality restoration of the target mind map can be realized to a certain extent.
In a specific implementation, in addition to marking and displaying the located prediction root node in the image preview interface, the image preview interface further includes a confirmation option, so that the user object can perform a trigger operation on the confirmation option in the image preview interface, and at this time, it is determined that the confirmation operation for the prediction root node exists, and a step of performing reduction processing on the whole target mind map can be triggered and executed. Then, after the complete target thinking map is obtained through restoration, the complete target thinking map can be output; the complete target mind map is in an editable state. The target mind map in the non-editable state in the target image can be converted into the target mind map in the editable state through the embodiment of the application, so that the user object can edit the target mind map, and the editing requirement of the user object on the target mind map is met. An exemplary schematic diagram of outputting a target mind map by triggering a confirmation operation in an image preview interface may be seen in fig. 8, where a confirmation option 801 is included in the image preview interface shown in fig. 8; in response to the trigger of the confirmation option 801, the user object is indicated to determine that the predicted root node is the correct root node in the target mind map, and then the computer device invokes the algorithm to continue to perform the restoration of the entire target mind map based on the predicted root node and output the restored target mind map.
In the embodiment of the application, the computer device responds to the restoration operation aiming at the target mind map, can perform prediction processing on the root node of the target mind map in the target image, and displays the predicted root node of the target mind map obtained through prediction in the target image; therefore, automatic prediction of the root node in the target thought-guide graph can be achieved, compared with the method of manually marking the root node, the method avoids the confirmation process of manually interfering the root node in the thought-guide graph, and reduces the workload of the user object. Further, the computer device can continue to restore the complete target thinking guide graph based on the predicted root node in response to the confirmation operation of the user object on the predicted root node; the method can be used for accurately predicting the target thinking guide graph based on the prediction, and improving the accuracy of the target thinking guide graph obtained by reduction. According to the scheme, in the process of restoring the target mind map, the complete target mind map can be quickly and accurately restored without manual intervention, and the restoring efficiency of the target mind map is improved.
FIG. 9 illustrates a flow chart diagram of a mind map identification method provided by an exemplary embodiment of the present application; the mind map recognition method shown in fig. 9, which mainly relates to the contents of the model application part and can be executed by a computer device (such as the computer device 201), may include, but is not limited to, steps S901-S904:
s901: a target image containing a target mind map is acquired.
S902: and responding to the restoration operation aiming at the target mind map, calling the target semantic segmentation model to perform root node prediction processing on the target image, and displaying the predicted root node in the target mind map in the target image.
In steps S901 to S902, a specific implementation process of obtaining a target image including a target mind map and a specific implementation process of a restoring operation for the target mind map may refer to the description of the specific implementation process shown in steps S301 to S302 in the embodiment shown in fig. 3, which is not repeated herein.
As described above, the restore operation for the target mind map may include: and (3) triggering operation of the user object on the completion option in the image acquisition interface, so that the computer equipment responds to the reduction operation aiming at the target thinking graph, can call the trained target semantic segmentation model to perform root node prediction processing on the target image to obtain a predicted root node in the target thinking graph, and then displays the predicted root node in the target image to prompt the user object to confirm the predicted root node of the computer equipment.
For ease of understanding, a detailed description of an implementation of a computer device to predict a root node in a target mind map in response to a restore operation on the target mind map, which may include steps s11-s12, is provided below in connection with FIG. 10:
s 11: and calling a target semantic segmentation model to perform root node prediction processing on the target image to obtain an initial prediction root node in the target thinking graph. As shown in fig. 10, first, a target image including a target mind map is input into a trained target semantic segmentation model, so that the trained target semantic segmentation model performs semantic segmentation processing on the target image to obtain a target semantic label image (or referred to as score map) corresponding to the target image. The target semantic label image comprises a foreground area and a background area. Wherein, the target semantic label image is a gray scale image (i.e. the zoomed views of the three channels of red (R), green (G) and blue (B) are all displayed in gray scale), and the foreground region in the target semantic label image may include: the foreground area is used for representing the area where the root node in the target thinking graph identified by the target semantic segmentation model is located; the background area in the semantic tag map may include: the background region usually refers to a region other than the foreground region in the target image, where the adjacent pixel point with the pixel value of 0 is located.
Then, a target connected domain (or called as a target connected domain) of the foreground region in the target semantic label image is calculated, and the target connected domain of the foreground region is used as a salient region in the target image, that is, the target connected domain is used as a region in the target image, which is interested by the user object. Wherein, the above mentioned target connected domain may comprise the maximum connected domain of the foreground region; the largest connected component of the foreground region may refer to: the foreground region has the same pixel value and the maximum region is composed of adjacent pixels. As shown in fig. 10, the maximum connected component in the foreground region is calculated, and the maximum connected component is taken as a saliency region, which is displayed as a white region; the root node in the target mind map can be quickly locked in the approximate range in the target image by determining the saliency area, so that the position of the root node can be found accurately from the saliency area.
And finally, taking the central point of the saliency region as an initial prediction root node in the target thinking graph. It is understood that a salient region in the target image is often an irregular closed region; in order to find the central point of the irregular closed region, the embodiment of the application provides an alternative way of finding the central point of the salient region; the process of finding the central point of the salient region may include: firstly, generating an outline of the salient region, then calculating a minimum bounding rectangle of the outline, wherein the minimum bounding rectangle is the minimum rectangle which can completely contain the salient region in the region, and finally calculating a central point of the minimum bounding rectangle, and taking the central point as an initial prediction root node in the target thinking guide graph. As shown in fig. 10, a white area in the target semantic label image is a saliency area, and a minimum external rectangle of the saliency area is an external square, so that an intersection point of two diagonal lines of the external square can be determined as a central point of the external square, and the central point of the external square is used as an initial prediction root node in the target thinking guide graph, specifically, a position where the central point is located is used as a position where the initial prediction root node is located.
Based on the above description, the salient region in the target semantic label image is used to represent the region in which the user object is interested, and the region in which the root node in the target mind map is located. Thus, when the central point of the saliency region is taken as the initial prediction root node in the target mind map, the initial prediction root node can only be used to indicate: the approximate location of the root node in the target mind map in the target image. To obtain the exact location of the root node in the target mind map, embodiments of the present application further support performing step s12 based on the initial predicted root node to determine an exactly predicted root node in the target mind map.
s 12: determining a node distance between the initial prediction root node and each node according to the initial prediction root node and each node in the target thinking map; then, the node having the shortest node distance from the initial prediction root node is determined as the prediction root node. Specifically, the target image including the target mind map may be input to the node detection model, which may be a network model for performing equivalence recognition on all nodes of the target mind map in the target image, and what is called equivalence recognition may be simply understood as not distinguishing a hierarchical relationship between respective nodes (e.g., root nodes or sub-nodes) in the target mind map, but recognizing all nodes. Then, the node distance between the initial predicted root node located based on step s11 and each node identified using the node detection model is calculated. And finally, determining the node with the minimum node distance as a prediction root node in the target mind map.
The node distance between the initial prediction root node and any node may be an euclidean distance, that is, a straight-line distance between the initial prediction root node and the any node. As shown in fig. 11, by taking the upper left corner of the target image as the coordinate origin to establish a coordinate system, for example, the coordinate information of the initial prediction root node and the coordinate information of each node in the target mind map can be obtained; and then, calculating the Euclidean distance between the initial prediction root node and each node according to the coordinate information of the initial prediction root node and the coordinate information of each node. If the euclidean distance between the initial prediction root node and the node 1 is 0.2 cm, the euclidean distance between the initial prediction root node and the node 2 is 1 cm, the euclidean distance between the initial prediction root node and the node 3 is 1 cm, and the euclidean distance between the initial prediction root node and the node 4 is 1.2 cm, … …, it can be determined that the euclidean distance between the initial prediction root node and the node 1 is the shortest, and then the node 1 can be determined as the prediction root node in the target mind map.
In summary, an initial predicted root node in the target mind map is predicted through the target semantic segmentation model to indicate that the target is the approximate position of the root node in the map; and then calculating Euclidean distances between the initial prediction root node and each node, accurately determining the prediction root node of the target mind map from all nodes contained in the target mind map, and then performing reduction processing by taking the accurate prediction root node as a starting point. The method and the device can automatically position the prediction root node in the target mind map, so that the whole target mind map reduction process is automatically and intelligently executed, and the reduction efficiency of the target mind map is improved.
S903: in response to a validation operation against the prediction root node, a target mind map is output.
It should be noted that the specific implementation process shown in step S903 may refer to the description related to the specific implementation process shown in step S303 in the embodiment shown in fig. 3, and is not described herein again.
S904: in response to an operation to reselect a root node in the target image, displaying the newly reselected root node in the target image.
As described above, after the computer device finishes predicting the predicted root node in the target mind map, the computer device may display the predicted root node in the target image, so that the user object may determine whether the root node predicted by the computer device is accurate. If the user object approves the predicted root node predicted by the computer device (specifically, algorithm), that is, the user object confirms that the predicted root node is indeed the root node of the target mind map, the user object can trigger a completion option in the image acquisition interface, and at this moment, the confirmation operation aiming at the predicted root node is determined to exist; the computer device may proceed with subsequent steps of restoring the entire target mind map based on the predicted root node. If the user object does not recognize the predicted root node predicted by the computer device, that is, the user object considers that the predicted root node predicted by the computer device is in error in position, the embodiment of the application further supports that the user object reselects the root node in the target image, so that the computer device can execute subsequent restoration steps according to the new root node reselected by the user object. Through the implementation process, whether the predicted root node identified by the computer equipment is wrong or not, the computer equipment performs the restoration processing according to the correct root node when the whole target mind map is restored subsequently, and therefore the accuracy of the restored target mind map can be guaranteed.
In a specific implementation, when the user object judges that the root node is wrong, the user object can reselect the root node in the target image, and the computer device cancels the display of the predicted root node, specifically the display of the label of the predicted root node, in the target image and displays the newly selected root node in the target image, specifically the display of the newly selected root node in the target image in a label manner, in response to the operation of reselecting the root node in the target image; for the annotation display mode of the new root node that is newly selected for annotation display, reference may be made to the foregoing description, and the relevant description of the annotation display mode of the root node is predicted for the annotation display, which is not described herein again. An exemplary schematic of an exemplary reselection of a new root node is illustrated in FIG. 12; as shown in fig. 12, after the computer device predicts the predicted root node of the target mind map, the predicted root node may be displayed in a label manner in the image preview interface 1201, and prompt information 1202 is displayed, where the prompt information 1202 is used to prompt: the user object may reselect a new root node if it determines that the predicted root node is incorrect. Further, in the case that the user object determines that the predicted root node is incorrect, a selection point option 1203 in the image preview interface 1201 may be triggered, at this time, the label display for the predicted root node may be deleted from the target image, and the selection point option 1203 is highlighted to prompt that the user object may start to perform an operation of reselecting the root node. Then, the user object can click (e.g. click) any node in the target image, and the computer device determines any clicked node as a new root node for reselection in response to the click operation of the user object on any node in the target image, and marks and displays any clicked node in the target image.
The embodiment of the application also supports the operation of canceling the click selection of the user object on the node; the method can meet the requirement that the user object selects the root node for multiple times, improve the flexibility of root node selection and further improve the experience of the user object. With continued reference to fig. 12, the image preview interface shown in fig. 12 also includes an undo option 1204; when the user object performs a triggering operation on the revocation option 1204, which indicates that the user object wants to cancel a clicking operation that is closest to the current time (i.e., the time at which the revocation option is triggered), the target image cancels the display of the label of the node that was clicked the latest, and displays the root node determined before the node that was clicked the latest, where the root node may be a predicted root node or a root node selected by the user.
In addition, considering that the display area of the terminal screen of the computer device is limited, the target mind map displayed in the terminal screen is not clear under the condition that the number of nodes or contents of the target mind map is large; based on this, the embodiment of the application also supports that the user object adjusts the target image in the image preview interface, so that the target mind map in the adjusted target image can be displayed more clearly, which can meet the requirement that the user object can browse each part of the target mind map included in the target image more clearly. Wherein the adjusting of the target image may include: adjusting the display area size of the target image in the image preview interface, and/or adjusting the display part of the target image in the image preview interface (such as the partial image of the target image displayed in the terminal screen); in addition, the execution sequence of the two adjustments is not limited in the embodiment of the present application, for example, the display area of the target image in the image preview interface may be adjusted first, and then the display portion of the target image in the image preview interface may be adjusted. Several exemplary implementations of the adjustment of the target image by the user object at the image preview interface are given below, in which:
optionally, the gesture operation is executed in the image preview interface to adjust the display area of the target image displayed in the image preview interface, so as to adjust the display area of the target mind map included in the target image. As shown in fig. 13a, in response to a double-finger-pinch operation existing in the image preview interface, the computer device reduces the display area of the target image in the image preview interface according to the double-finger-pinch operation until the display area of the target image is the same as the display area of the region for displaying the target image in the image preview interface; as the display area of the target image is enlarged, the content contained in the target image, i.e., the target mind map, is also enlarged, so that the user object can browse the content contained in the target mind map more clearly. Similarly, as shown in fig. 13b, in response to the operation of two-finger expansion existing in the image preview interface, the computer device may enlarge the display area of the target image in the image preview interface according to the operation of two-finger expansion until the display area of the target image reaches the preset display area threshold. Of course, the above-mentioned double-finger pinch operation or double-finger spread operation is only an exemplary gesture operation for adjusting the target image; the embodiments of the present application do not limit this.
Optionally, the display area of the target image is adjusted through a control (or called as a component, an option, a button, or the like), so as to adjust the display area of the target mind map included in the target image. As shown in fig. 13c, an enlargement control 1301 and a reduction control 1302 are further included in the image preview interface; when any control is triggered, the user object is indicated to want to execute the function corresponding to the triggered control on the target image, for example, the target image can be enlarged in the image preview interface in response to the trigger of the user object on the enlargement space 1301. When the control is enlarged (or reduced) each time, the enlargement size of the corresponding enlarged target image can be preset by business personnel, and if the control is enlarged each time, the display area of the target image can be enlarged by 1.1 times in the image preview interface; the embodiments of the present application are not limited to the specific scaling sizes, and are described herein.
Optionally, the target image is slid in the image preview interface, so that the display portion of the target image is adjusted, and further the display portion of the target mind map is adjusted. As shown in fig. 13d, if the computer device detects a sliding operation of sliding to the right in the image preview interface, the target image may be slid to the right according to the sliding operation, such that a left portion of the target image appears and a right portion is hidden; by the method, the user object can check any part of content under the condition that the display area of the target image is large, and the use experience of the user object is improved.
It should be noted that, the above only gives some exemplary implementation processes for adjusting the target image in the image preview interface; however, depending on the computer device, the implementation of adjusting the target image in different application scenarios may vary. For example, when the computer device adjusts the target image by externally connecting a mouse, the display area of the target image can be zoomed by rolling the mouse; the specific adjustment mode for adjusting the target image in the embodiment of the present application is not limited.
S905: in response to a validation operation on the new root node, the target mind map is output.
The specific implementation process of the confirmation operation on the new root node described in step S904 is similar to the specific implementation process of the confirmation operation on the predicted root node described in the embodiment shown in fig. 3, and is not described herein again.
In a specific implementation, in response to the confirmation operation on the new root node, the computer device may search for child nodes related to the start point in the target mind map by using the new root node as the start point in a depth-first traversal manner, and push the child nodes layer by layer to finally restore the complete structure of the mind map. The process of restoring the target mind map by adopting a depth-first traversal mode can be simply summarized as follows: taking a root node (such as a predicted root node or a new root node selected by a user object) as a starting point, finding a path and continuously and deeply exploring nodes on the path; when the path is detected to be unavailable, returning to the last explored node; if the last node does not have a branch to be explored, continuing to explore and if not, continuing to retreat until each path in the target thinking graph is traversed. Wherein, the path in the target mind map may refer to: and a channel which takes the root node as a starting point and adopts a thinking guide graph connecting line to connect the root node and one or more child nodes in series.
After the computer device restores the target mind map to obtain a complete target mind map, the computer device can output the target mind map by adopting an editing interface of a mind map tool (or mind map service) provided by a target application program. It should be noted that the target mind map is displayed in the editing interface of the mind map tool, and the editing interface can be understood as a service interface for editing (e.g. creating) the mind map in the mind map tool; therefore, the embodiment of the application supports the user object to perform editing operation on the displayed target mind map in the editing interface, such as modifying the node content in the target mind map in the editing interface by using the function provided by the mind map tool.
Furthermore, the embodiment of the application supports the user object to execute editing operation on the displayed target mind map in the editing interface. Specifically, the computer device responds to the editing operation of the target mind map in the editing interface and updates the target mind map; the editing operation specifically comprises the operation of editing the target mind map by adopting the functions provided by the mind map tool; editing operations may include, but are not limited to: the method comprises one or more of operations of modifying the color, the style and the content contained in the nodes in the target mind map (such as copy and paste operations), operations of modifying the thickness, the style and the color of connecting lines of the mind map in the target mind map, operations of adding nodes in the target mind map and operations of deleting the original nodes in the target mind map. For example: the mind map lines in the target mind map are solid lines, and the editing operations performed on the target mind map may include: and modifying the thinking diagrams of the entities in the target thinking diagram into dotted lines. The following steps are repeated: the root node in the target mind map appears as a white circular box, and the editing operations performed on the target mind map may include: modifying the white circular frame into a rectangular red frame; and so on. The user object may perform an editing operation on the target mind map in the editing interface according to the business requirements of the user object, and the embodiment of the present application is not described in detail herein.
Referring to fig. 14, an exemplary schematic diagram of an editing operation is given by taking the editing operation including modifying the content included in the node as an example; as shown in fig. 14, if a root node in the editing interface 1401 is triggered, the root node is displayed in an editable state, the display style of the root node in the editable state is different from the display styles of other nodes, and the root node shown in fig. 14 is displayed in the editing frame 1402; and outputting an edit bar 1403 in the edit interface 1401, where the edit bar 1403 includes one or more edit options, such as an edit item, a paste item, and the like, and when any edit option is triggered, a function corresponding to the triggered edit option can be executed. If the edit item is triggered, outputting a toolbar 1404 in the edit interface 1401, wherein the toolbar 1404 comprises one or more tools, such as a keyboard, a left branch, a right branch, a font and the like; and if the keyboard tool is triggered, displaying a virtual keyboard in an editing interface so that the user object can re-edit the content contained in the root node through the virtual keyboard, and updating the original content contained in the root node by using the new content obtained by editing after the editing is finished so as to edit the content contained in the root node.
Furthermore, the embodiment of the application also supports sharing of the edited target mind map. In specific implementation, the computer device responds to an editing completion operation in the editing interface, and can generate a shared image based on the edited target mind map, wherein the shared image comprises the edited target mind map; and then, sharing the shared image to the shared object so as to realize the sharing of the edited target mind map. According to different mind mapping tools, the editing completion operations executed in the editing interface provided by the mind mapping tool are different; an exemplary schematic diagram of sharing the edited target mind map is given below by taking the editing interface shown in fig. 15 as an example; as shown in fig. 15, a share option 1501 is included in the editing interface; when the sharing option 1501 is triggered, indicating that the user object wants to share the target mind map currently displayed in the editing interface, a sharing image may be generated based on the target mind map currently displayed, and the sharing object list 1502 may be displayed. The sharing object list may include: identification of the shared object, shared image links, and the like.
Optionally, the sharing object may include a receiver that is to receive the sharing image, where the identifier of the sharing object includes information (such as an account) that can be used to uniquely identify the identity of the sharing object; then the shared image may be shared to any shared object in response to a trigger by the user object to identify the shared object. Optionally, the shared object may further include services provided by other applications besides the target application, such as a social dynamic service provided by a social application, and the like, where the identifier of the shared object includes a service identifier of the social dynamic service; then, in response to the service identifier of the user object for the social dynamic service, the shared image may be shared into a social dynamic interface provided by the social application program, where the social dynamic interface includes a dynamic message stream (or referred to as feeds stream) composed of dynamic messages sent by one or more user objects, and the dynamic message stream may be dynamically updated in the social dynamic interface through a refresh operation. Optionally, if the shared image link in the shared list is selected, the user object may share the shared image link of the shared image, so that the recipient may obtain the shared image through the shared image link. The embodiment of the present application does not limit the specific sharing process of sharing images, and the above description is only provided for some exemplary sharing embodiments, which are described herein.
In the embodiment of the application, the computer equipment responds to the reduction operation aiming at the target thinking guide graph, and can call the trained target semantic segmentation model to predict the root node of the target thinking guide graph in the target image, so that the automatic prediction of the root node in the target thinking guide graph is realized, manual intervention is not needed, and the workload of a user object is reduced. In addition, the embodiment of the application also supports that the user object reselects a new root node from the target image when judging that the predicted root node is wrong, so that the computer device can restore the target mind map based on the new root node, and the accuracy of the target mind map finally restored is ensured. In addition, the embodiment of the application also supports editing and sharing of the target mind map obtained by reduction, meets the requirements of the user object on editing and sharing of the target mind map, and improves the use experience of the user object.
The embodiments shown in fig. 3 and fig. 9 mainly give a specific implementation process of the model application, and relevant contents of the model training part related to the embodiments of the present application are described below. It should be particularly noted that the semantic segmentation model provided in the embodiment of the present application is used for predicting a semantic segmentation image corresponding to an input image, where the semantic segmentation image includes a foreground region where a root node of a thought guide graph is located; that is to say, the goal of training the initial semantic segmentation model in the embodiment of the present application is: the trained target semantic segmentation model can predict the area where the root node is located. Therefore, the correct root node in the thought-derivative graph can be identified according to the foreground region in the semantic segmentation image. In more detail, the embodiment of the application adopts a supervised learning mode to perform model training on the initial semantic segmentation model. The supervised learning can be understood as: the initial semantic segmentation model is trained by adopting the training data set, and the obtained trained target semantic segmentation model can accurately predict the areas of the root nodes in other images except the training data set. The overall thought for training the initial semantic segmentation model by adopting the supervised learning mode in the embodiment of the application is as follows: firstly, determining a correct semantic segmentation image corresponding to a training image, and then training an initial semantic segmentation model by adopting the correct semantic segmentation image; the trained target semantic segmentation model can identify the region where the root node of the target thinking graph included in any target image is located; and then, the prediction root node can be accurately obtained according to the target semantic segmentation image corresponding to the target image predicted by the target semantic segmentation model. The specific implementation process of accurately identifying the prediction root node in the target mind map according to the target semantic segmentation image can refer to the related description of the specific implementation process shown in step S11 in step S902 in the embodiment shown in fig. 9, which is not described herein again.
The following describes details of the training part of the model provided in this embodiment in detail, taking a training process of an initial semantic segmentation model using a training image as an example, with reference to fig. 16. FIG. 16 illustrates a flow chart diagram of a thought graph recognition method provided by an exemplary embodiment of the present application; the mind recognition method may be performed by a computer device (e.g., computer device 202), and may include, but is not limited to, steps S1601-S1607:
s1601: a training image is acquired.
In specific implementation, training images can be obtained from a training image set; the training image set comprises a plurality of sample images used for training the initial semantic segmentation model, and the training image is any one of the sample images in the training image set. Wherein, the sample images included in the training image set can be divided into a first sample image and a second sample image; the first sample image can be a sample image synchronized directly from the internet and/or a local storage space and containing a training mind map; the second sample image is a sample image obtained by performing data enhancement processing (or referred to as data expansion processing) on the first sample image; then combining the one or more first sample images and the one or more second sample images to obtain the training image set. It should be noted that, the sample images in the training image set may also include only the first sample image, and the embodiment of the present application does not limit the specific acquisition manner of the sample images in the training image set.
The data enhancement processing performed on the first sample image may include, but is not limited to: one or more of a blurring process, a scaling process, a mirroring process, a rotation process, a cropping process, or an enhanced noise process. For example, the blurring process may also be referred to as a filtering process, and mainly includes blurring the first sample image and extracting important information of the first sample image; the blurring process includes but is not limited to: gaussian blur, median blur, mean (salt and pepper) blur, etc. The following steps are repeated: the scaling process may refer to data enhancement of the first sample image by changing a display area of the first sample image. For example, the rotation processing may be to increase the data amount by rotating the first sample image up and down and/or left and right, while keeping the display area of the first sample image unchanged. The embodiment of the present application is not limited to the specific type of data enhancement processing, and is described herein.
Further, after the training image set is obtained, the embodiment of the application also supports labeling of sample images in the training image set, specifically, labeling the position of the root node in the training thinking graph in each sample image, so that the sample image labeled with the position of the root node is subsequently adopted to train the initial semantic segmentation model to be optimized. In specific implementation, a root node labeling system can be adopted to label the position of a root node in a training mind map included in each sample image. The root node labeling system can be a labeling system supporting business personnel (or labeling personnel) to manually label the root node of the thought guide graph. The specific implementation process of marking the position of the root node by using the root node marking system may include: in the root node marking system, a sample image which is not marked is presented to a marking person each time, so that the marking person can trigger (such as click) the position of the root node in the sample image; then, the root node marking system responds to the trigger of the marking personnel in the interface, can automatically record the coordinate information of the position triggered by the marking personnel in the interface, and takes the coordinate information as the coordinate information of the root node in the sample image; and storing the coordinate information into a text document with the same name as the sample image so as to record the position of the root node in the sample image. Then, the root node labeling system can automatically switch to the next unmarked training image, so that the labeling personnel can continue to perform the root node labeling operation on the next unmarked training image.
For example: assuming that a sample image is shown in fig. 1a, and an area where a root node in a training mind map included in the sample image is located is an image block (or a text block) containing a text, when it is detected that an annotator triggers any one of the positions in the image block, determining that the center position of the image block is used as a coordinate position of the root node; and storing the coordinate information of the central position into a text document with the same file name as the sample image so as to realize the labeling of the root node of the sample image. The following steps are repeated: assuming that a sample image is shown in fig. 1b, and a root node in a training thinking graph included in the sample image is an end point of a line, when it is detected that a annotating person triggers the end point, determining coordinate information of the end point as coordinate information of the root node; and storing the coordinate information of the endpoint into a text document with the same file name as the sample image so as to realize the labeling of the root node of the sample image.
Through the implementation process, the root node of each sample image in the training image set can be labeled, and the root node labeling file corresponding to each sample image is obtained. That is to say, the data set used for training the initial semantic segmentation model to be optimized includes, in addition to the training image set, a root node annotation file set corresponding to the training image set, where the root node annotation file set includes a root node annotation file corresponding to each sample image in the training image set, and any root node annotation file records coordinate information of a root node in a sample image corresponding to the any root node annotation file.
S1602: and carrying out first semantic segmentation processing on a training thinking guide image in the training image to obtain a first semantic label image.
As described in the foregoing, the embodiment of the present application trains the initial semantic segmentation model by means of supervised learning; before the initial semantic segmentation model is trained, a first semantic label image used for training the initial semantic segmentation model needs to be determined, and a foreground region included in the first semantic label image is a region where a root node in a training thinking guide graph in a training image is located. The implementation process of performing the first semantic segmentation processing on the training mind map in the training image to obtain the first semantic label image can be simply described as follows: firstly, extracting a salient region in a training image to obtain an attention image (or called an attention heat map) corresponding to the training image; then, extracting the characteristics of the attention map image to obtain a characteristic map; and then, the labels of all the feature points in the feature map are redefined by combining the attention image and the feature map to obtain a first semantic label image. A specific implementation of determining the corresponding first semantic label image of the training image is given below in conjunction with FIG. 17, and may include, but is not limited to, steps s21-s 23:
s 21: and carrying out region extraction processing on the training image to obtain an attention image.
According to the description, the training image corresponds to a root node marking file, and the root node marking file comprises the coordinate information of the root node in the preset training thinking guide image; the preset coordinate information of the root node in the training thinking guide graph is obtained by labeling a label person in a training image by using a root node labeling system. Then, a root node annotation file corresponding to the training image may be obtained, and then, region extraction processing (or referred to as saliency region extraction processing) is performed on the training image based on the coordinate information in the root node annotation file, so as to generate an attention image. Wherein, the attention image comprises: a saliency region centered on a coordinate position indicated by the coordinate information; the color depth of the salient region in the attention image as shown in fig. 17 is larger than that of the other regions in the attention image.
Specifically, the salient region extraction rule for performing the region extraction process on the training image may include: and radiating and diverging towards the periphery by taking the coordinate position indicated by the coordinate information in the root node marking file as a central point (or called as an attention center) to generate an attention image. In this implementation, the salient region in the generated attention image refers to: a circular region centered on the coordinate position indicated by the coordinate information, and a color depth in the salient region is greater than color depths of other regions in the attention image; the darker the area, the higher the attention of the user object to the area. In a specific implementation, considering that the size of a radius radiating from the attention center to the periphery is related to the size of a training image, firstly, acquiring display size information of the training image, wherein the display size information comprises width information and height information of the training image; calculating the radiation radius of the salient region according to the display size information; the calculation formula for calculating the radiation radius by the user is as follows:
r ═ min (w, h) × RADIUS _ RARIO formula 1
r is the RADIUS of radiation of the salient region, w is the width information of the training image, h is the height information of the training image, min (w, h) is the minimum value of the width information w and the height information h, and RADIUS _ RATIO is an experimental empirical value. The specific value of the RADIUS _ RATIO can be automatically adjusted according to the characteristics of the training image which is actually used; for example, in the embodiment of the present application, when the RADIUS _ RATIO is taken as 0.25, the calculated radiation RADIUS of the significant region is more appropriate.
Then, after the radiation radius of the saliency region is calculated based on the formula, the saliency region in the training image can be determined according to the coordinate information and the radiation radius in the root node labeling file. The salient regions determined in the attention image shown in fig. 16 include: and a circular area in which the coordinate position indicated by the coordinate information is taken as a central point and the radiation radius r is taken as a radius in the training image.
Finally, the thermal values of the respective feature points in the salient region are calculated to generate an attention image. The embodiment of the application adopts the heat value to represent the attention degree of the user object to the pixel point (or the characteristic point); the higher the thermal value of the feature point in the attention image is, the higher the attention point of the user object to the feature point is, and the higher the probability that the feature point is the feature point where the root node is located is. It can be understood that, the attention center in the salient region has the highest attention degree, and then the thermal value of the feature point at the position of the center is the highest, and then the thermal value of each feature point gradually attenuates to the periphery according to the radiation radius r; the calculation formula for calculating the heat value of each characteristic point in the salient region is as follows:
Figure BDA0003608224210000311
the heatmap is the attention image (or attention heat map), the heatmap (j, i) is the thermal value of the pixel (or feature point) in the jth row and ith column in the attention map image, and (x, y) is the attention center labeled in the attention map image. As can be seen from the calculation formula, the maximum heat value of the feature point in the attention image is 1, and the minimum heat value is 0; the larger the thermal value of a feature point in the attention image is, the higher the attention degree of the feature point is, and the higher the probability that the feature point is a feature point in the region where the root node is located is.
Based on the implementation process, the salient region extraction processing can be carried out on the training image to generate an attention image; the attention image includes a saliency region, the thermal value of each feature point in the saliency region is greater than a thermal value threshold 0, and the thermal value of the feature point in the other region than the saliency region in the attention image is a thermal value threshold 0.
It should be noted that, the embodiment of the present application also supports setting the thermal values of the feature points in the salient region to be the same thermal value; in this implementation, the thermal force value of the feature point in the salient region in the attention image is a preset thermal force value (e.g., 1), and the thermal force value of the feature point in the other region except the salient region in the attention image is 0. Or, the embodiment of the application also supports that a prediction model is adopted to predict the training image so as to obtain the salient region in the training image. The embodiment of the present application does not limit the specific implementation process of the saliency region extraction rule for extracting the saliency region in the training image, and according to the actual application requirements, other rules with the same function may be adopted to replace the above-mentioned saliency region extraction rule, which is specifically described herein.
s 22: and (5) performing feature extraction on the attention map image to obtain a feature map. Among them, the purpose of feature extraction is to map a high-dimensional feature space of an attention image to a low-dimensional feature space. Each pixel point in the feature map after feature extraction is assigned with a category label, and the label value of the category label is the pixel value of the corresponding pixel point; in addition, the pixel values of adjacent pixel points where the same object is located in the feature map are the same, so that when the feature map is visually presented, the same object (such as a vehicle, a human body or a table) in the feature map is represented in the same color, and thus, classification and identification of the objects in the feature map are facilitated. The embodiment of the application does not limit which specific feature extraction algorithm is adopted to extract the features of the attention map image; for example, a neural network with a feature extraction function may be used to perform feature extraction on the attention image to obtain a feature map corresponding to the attention image.
s 23: and redefining the class labels in the feature map according to the attention map to obtain a first semantic label image. The implementation process of the class label redefinition process may include: and replacing the label value of the class label of the corresponding feature point in the feature map by adopting the thermal value of each feature point in the salient region in the attention image so as to realize the redefinition of the pixel value of each feature point in the feature map.
For example, assuming that the thermal force value of a target feature point (e.g., any feature point) within a salient region in the attention image is 0.5, and the label value of the class label of the target feature point in the feature map is 156, the redefining process may include: and assigning the thermal force value of the target feature point in the attention image to the class label of the target feature point in the feature map, so that the label value of the class label of the target feature point in the feature map is 0.5. Through the above process of class label redefinition, it is possible to separate a saliency region (e.g. thermal value greater than 0, and thermal value greater closer to the center) from other regions (e.g. thermal value of 0), resulting in a first semantic label image comprising a foreground region and a background region. The pixel value of the characteristic point in the foreground area in the first semantic label image is not 0, so that the visual effect of the foreground area is represented as a circular area with the attention center as white and gradually deepened gray; similarly, the pixel point in the background region in the first semantic label image is 0, so that the visual effect of the background region is black.
Through the process of salient region extraction → feature extraction → label redefinition of the training image, the real first semantic label image of the training image can be obtained, and therefore the monitoring model training can be performed on the initial semantic segmentation model to be optimized subsequently by adopting the real first semantic label image of the training image. The trained target semantic segmentation model can accurately predict and obtain any semantic segmentation image corresponding to the image input into the target semantic segmentation model.
S1603: and calling the initial semantic segmentation model to perform second semantic segmentation processing on the training thinking guide graph in the training image to obtain a second semantic label image.
The semantic segmentation model is introduced firstly, and the semantic segmentation model provided by the embodiment of the application can be called a semantic segmentation network or a segmentation network, and is a network model for realizing semantic segmentation of images; by semantic segmentation, it is meant a deep learning algorithm that associates a label or class with each pixel of an image, which can be used to identify the set of pixels that make up a region class. A simple example of semantic segmentation is to divide an image into two categories, such as the image shown in fig. 18a, which includes penguins swimming, and perform semantic segmentation on the image to obtain two different categories of image pixels, which are: penguins and backgrounds. Similar to the above example, when the semantic segmentation is applied to the embodiment of the present application, the semantic segmentation is also a binary classification, that is, a semantic label image obtained by performing the semantic segmentation on the training image may include two different classes of image pixels, as shown in fig. 18b, the semantic label image includes: a root node region (i.e., the foreground region mentioned earlier) and a background region (i.e., a region in the training image other than the foreground region).
An exemplary model structure of the semantic segmentation model provided in the embodiment of the present application may be seen in fig. 19, as shown in fig. 19, the semantic segmentation model mainly includes: a downsampling module (stmnet), an N-level feature extraction module (e.g. 4stageconv), a feature fusion module (feature concat), and a classification module (segmenthead). Wherein: the down-sampling module (stemnet) includes convolution of 3 × 3 with 2 steps stride 2, which can achieve 1/4 down-sampling (or down-sampling) the training image input to the initial semantic segmentation model to the original resolution (i.e. the resolution before the training image is input to the initial semantic segmentation model). Of course, the convolution kernel and the step size of the convolution included in the downsampling module may be changed according to different service requirements, which is not limited in the embodiment of the present application. N layers of feature extraction modules (such as 4stageconv), wherein N is an integer greater than 1; taking N as an example of 4, the 1 st stage includes 4 residual units, each residual unit includes 2 convolutions of 3 × 3, the 2 nd, 3 rd and 4 th stages include 1, 4 and 3 multiresolution blocks, respectively, and each branch in the multiresolution group convolution includes 4 residual units. By the 4stageconv, the feature maps with different resolutions can be connected in parallel, one branch with the same resolution and different branches with different resolutions are occupied, and a path (namely, oblique lines among different branches in fig. 19) is added among different branches to form a high-resolution network (high-resolution network); the high-resolution network can realize the feature acquisition of different resolutions on the down-sampled image, and enrich the features acquired by model acquisition. And a feature fusion module (feature concat) for upsampling the feature map corresponding to the layer with the lower resolution to a high resolution, and performing feature fusion on the plurality of feature maps with the same resolution after the upsampling to obtain a new feature map. The lower resolution layer here may refer to the three lower resolution layers shown in fig. 19; that is, the feature map corresponding to each of the three layers with lower resolution may be up-sampled to the same resolution as that of the feature map corresponding to the layer with the highest resolution, so as to unify the resolutions of the feature maps. Among them, the upsampling method may include but is not limited to: linear interpolation (e.g., bilinear interpolation) or deconvolution, and the upsampling method is not limited in the embodiments of the present application. The classification module (segmenthead) can superpose the features of the feature maps with the same resolution ratio processed by the up-sampling module together, so as to predict a second semantic label image corresponding to the training image; specifically, the features of the feature maps with the same resolution processed by the upsampling module may be connected together by convolution concat (e.g., superposition) of 1 × 1 to obtain a hybrid representation; and then, transmitting the mixed representation of each pixel point to a classification module so that the classification module can predict a second semantic label image.
Based on the above-mentioned related introduction to the model structure of the semantic segmentation model, the following describes an implementation process of performing a second semantic segmentation process on a training image by using the initial semantic segmentation model. The specific implementation process may include:
firstly, after a training image is acquired, the training image can be preprocessed, and the preprocessed training image of the target dimension is used as the network input of an initial semantic segmentation model. The pretreatment process here may include: and (3) taking the short edge of the training image as a standard, and adjusting the size of the training image. By preprocessing the training image, the calculation redundancy caused by compressing the image or cutting the image can be reduced to the maximum extent, and the structure information of the image frame can be maintained to the maximum extent. When the width w of the training image is greater than or equal to the height h, which indicates that the short side of the training image is the height h, the height h may be fixed to a preset value (e.g. 448), and the dimensionality of the training image after preprocessing is: n 448 w 3. When the width w of the training image is smaller than the height h, which means that the short side of the training image is the width w, the width w may be fixed to a preset value (e.g. 448), and the dimension of the preprocessed training image is: n x h 448 x 3; n in the above target dimension formula is the number of a batch (batch) of training images input into the initial semantic segmentation model each time, and the value 3 is the number of channels.
And secondly, calling a downsampling module (stemnet) included by the initial semantic segmentation model, and carrying out downsampling processing on the preprocessed training image to obtain a first resolution ratio image. Then, a multi-layer feature extraction module (such as an N-order convolution layer) in the initial semantic segmentation model is called to extract features of the first-resolution-ratio image, so that feature extraction of the first-resolution-ratio image by adopting different resolutions is realized, the extracted image features are enriched, and N feature maps with different resolutions are obtained.
Then, calling a feature fusion module (feature concat) in the initial semantic segmentation model, and performing feature fusion processing on the N feature maps to obtain a fused target feature map. As shown in fig. 20, firstly, the feature maps with the resolution lower than the resolution threshold in the N feature maps are subjected to upsampling processing to obtain N feature maps with the same resolution, where the resolution threshold may be a resolution corresponding to a layer with the highest resolution in the multi-layer feature extraction module; then, carrying out fusion processing (or called superposition processing) on the N feature maps with the same resolution to obtain a target feature map; at the moment, each pixel point in the target characteristic diagram has a new representation, and the new representation of any pixel point is the mixed representation of corresponding pixel points in the N characteristic diagrams.
And finally, calling a classification module (segment head) in the initial semantic segmentation model to classify the target feature graph to obtain a second semantic label image. Wherein, the classification module (segment head) can be simply understood as an activation function (such as softmax function) for converting the output values of multi-classification into probability distribution in the range of [0,1] and 1; wherein the definition of the softmax function is as follows:
Figure BDA0003608224210000351
wherein z is k Is the output value (e.g., pixel value) of the kth class, and C is the number of classes, i.e., the number of classes to be classified. The number of classes classified in the embodiment of the present application is 2, that is, the embodiment of the present application performs two classes of classification processing on the target feature map, including a foreground class and a background class; and representing the second semantic label image obtained by classification processing into a foreground area and a background area.
It should be noted that fig. 19 is a schematic diagram showing a module structure of an exemplary semantic segmentation model; in practical applications, other network structures having the same functions as the semantic segmentation model shown in fig. 19 may be used instead, and the specific structural style of the network structure is not limited in the embodiment of the present application.
S1604: and obtaining a loss function of the initial semantic segmentation model, and calculating a loss value of the loss function according to the first semantic label image and the second semantic label image so as to train the initial semantic segmentation model according to the loss value.
In the specific implementation, a loss function of the initial semantic segmentation model is adopted to calculate a loss value, and the initial semantic segmentation model is continuously optimized according to the loss value to obtain a trained target semantic segmentation model. The loss function (or referred to as an optimization function) of the initial semantic segmentation model can be expressed as:
pixeloss=-∑y true log(y pred ) Equation 4
Where pixeloss is the loss function, y true For the true first semantic tag image, y, corresponding to the training image pred And predicting the training image by adopting the initial semantic segmentation model to obtain a predicted second semantic label image.
Further, if the loss value of the loss function obtained through calculation meets the preset condition, it indicates that the initial semantic segmentation model has reached the convergence condition, that is, the initial semantic segmentation model has better prediction performance, the initial semantic segmentation model obtained through the training in the current round can be determined as the trained target semantic segmentation model. The loss value satisfying the preset condition may include, but is not limited to: the loss value is smaller than the loss threshold, or the difference value between the loss values obtained by adjacent training is smaller than the difference threshold (such as 0), and the like; it should be noted that the loss threshold may be different for different network models, and will not be described in detail here. . Otherwise, if the calculated loss value does not meet the preset condition and the prediction performance of the initial semantic segmentation model does not meet the requirement, performing iterative training on the initial semantic segmentation model, and specifically optimizing the initial semantic segmentation model by a gradient descent method, so that the semantic tag image predicted by the optimized initial semantic segmentation model is closer to the real semantic tag image. The implementation process of optimizing the initial semantic segmentation model by the gradient descent method can be simply understood as follows: and optimizing the network parameters of the initial semantic segmentation model by adopting the loss value, and continuously training the optimized initial semantic segmentation model by adopting other samples in the training image set under the condition that the training times are less than the preset total times.
Through the implementation process, a trained target semantic segmentation model can be obtained; because the training mind maps contained in the training images in the training image set used for training the semantic segmentation model are different, the trained target semantic segmentation model can predict root nodes in various mind maps, so that the target semantic segmentation model has better generalization capability and can perform root node prediction processing on different mind maps.
S1605: a target image containing a target mind map is acquired.
S1606: and responding to the reduction operation aiming at the target thinking graph, calling a target semantic segmentation model to perform root node prediction processing on the target image, and outputting a predicted root node obtained through prediction.
S1607: and responding to the confirmation operation of the prediction root node, restoring the target mind map based on the prediction root node, and outputting the restored target mind map.
It should be noted that, for the specific implementation processes shown in steps S1605-S1607, reference may be made to the related description of the specific implementation processes in the embodiments shown in fig. 3 and/or fig. 9, which is not described herein again.
In the embodiment of the application, training of the initial semantic segmentation model by using the training images comprising different training mind maps is supported, so that the trained target semantic segmentation model can predict root nodes in various mind maps, the target semantic segmentation model is ensured to have better generalization capability, and root node prediction processing can be performed on different mind maps. In addition, the computer equipment responds to the reduction operation aiming at the target mind map, the trained target semantic segmentation model can be called to automatically carry out prediction processing on the root node of the target mind map in the target image, manual intervention is not needed, the workload of a user object is reduced, and the use experience of the user object is improved.
The method of the embodiments of the present application is described in detail above, and in order to better implement the method of the embodiments of the present application, the following provides a device of the embodiments of the present application.
FIG. 21 is a schematic diagram illustrating an example of a mind map identifying apparatus, which may be a computer program (including program code) running on a computer device, according to an embodiment of the present application; the mind map identifying apparatus may be used to perform some or all of the steps in the method embodiments shown in FIGS. 3, 9 and 16; the device comprises the following units:
an acquisition unit 2101 configured to acquire a target image including a target mind map;
a processing unit 2102 for displaying a prediction root node in the target mind map in the target image in response to a restore operation for the target mind map;
the processing unit 2102 is further configured to output a target mind map in response to the confirmation operation on the prediction root node, the target mind map being in an editable state.
In one implementation, the processing unit 2102, when displaying the predicted root node in the target mind map in the target image, is specifically configured to:
marking and displaying a prediction root node in a target thinking diagram in a target image;
the label display comprises the following steps: displaying the prediction root node in a visual display mode; or displaying the prediction root node in the form of a label identifier;
wherein, the label mark is displayed in the area where the prediction root node is positioned; or the label mark is displayed in the target image in an annotation form.
In one implementation, the processing unit 2102, when obtaining the target image including the target mind map, is specifically configured to:
displaying a function selection interface, wherein the function selection interface comprises an image-to-thought guide diagram option;
responding to the triggering of the image-to-thought guide picture option, and displaying an image acquisition interface;
and acquiring a target image containing the target mind map in the image acquisition interface.
In one implementation, the image acquisition interface includes an image uploading option, and the processing unit 2102 is configured to, when acquiring a target image including a target mind map in the image acquisition interface, specifically:
in response to triggering the image upload option, displaying at least one candidate image;
any candidate image is selected from at least one candidate image, the selected candidate image is used as a target image, and a mind map included in the candidate image is used as a target mind map included in the target image.
In one implementation, the image acquisition interface includes an image scanning option, and the processing unit 2102 is configured to, when acquiring a target image including a target mind map in the image acquisition interface, specifically:
and responding to the triggering of the image scanning option, and performing scanning operation on a target image containing the target mind map to acquire the target image.
In one implementation, the processing unit 2102 is further configured to:
in response to the operation of reselecting the root node in the target image, the marking display of the predicted root node is cancelled in the target image, and the newly reselected root node is marked and displayed in the target image;
outputting a target mind map in response to a validation operation on the new root node;
wherein the annotation display of the prediction root node is revoked for indicating: the prediction root node is not validated.
In one implementation, the processing unit 2102 is further configured to:
updating the target mind map in response to an editing operation on the target mind map;
and responding to the sharing operation of the updated target mind map, and sharing the shared image containing the updated target mind map.
In one implementation, the processing unit 2102 is configured to, in response to a restore operation on the target mind map, when displaying a predicted root node in the target mind map in the target image, specifically:
responding to the reduction operation aiming at the target thinking guide graph, calling a target semantic segmentation model to carry out root node prediction processing on the target image to obtain an initial prediction root node in the target thinking guide graph;
determining a node distance between the initial prediction root node and each node according to the initial prediction root node and each node in the target thinking map;
and taking the node with the shortest node distance to the initial prediction root node as the prediction root node.
In one implementation, the processing unit 2102, when invoking the target semantic segmentation model to perform root node prediction processing on the target image to obtain an initial predicted root node in the target thinking graph, is specifically configured to:
calling a target semantic segmentation model to perform semantic segmentation processing on a target image to obtain a target semantic tag image corresponding to the target image, wherein the target semantic tag image comprises a foreground region;
taking a target connected domain of the foreground region as a salient region in a target image;
the central point of the salient region is used as an initial prediction root node in the target mind map.
In one implementation, the processing unit 2102 is further configured to:
acquiring a training image, wherein the training image comprises a training thinking guide picture;
performing first semantic segmentation processing on a training thinking guide image in a training image to obtain a first semantic label image; and the number of the first and second groups,
calling the initial semantic segmentation model to perform second semantic segmentation processing on the training mind map in the training image to obtain a second semantic label image;
obtaining a loss function of the initial semantic segmentation model, and calculating a loss value of the loss function according to the first semantic label image and the second semantic label image;
if the loss value meets the preset condition, the initial semantic segmentation model reaches a convergence condition, and the initial semantic segmentation model is used as a trained target semantic segmentation model;
and if the loss value does not meet the preset condition, performing iterative training on the initial semantic segmentation model.
In one implementation, the processing unit 2102 is configured to perform a first semantic segmentation process on a training mind map in a training image to obtain a first semantic label image, and specifically configured to:
performing region extraction processing on the training image to obtain an attention image, wherein the attention image comprises a salient region;
extracting features of the attention map image to obtain a feature map;
and redefining the class labels in the feature map according to the attention map to obtain a first semantic label image.
In one implementation, the processing unit 2102 is configured to perform a region extraction process on the training image, and when obtaining the attention image, specifically configured to:
acquiring a root node marking file corresponding to the training image, wherein the root node marking file comprises coordinate information of a root node in a preset training thinking guide graph;
performing region extraction processing on the training image based on the coordinate information to generate an attention image, wherein the attention image comprises: a salient region centered on the coordinate position indicated by the coordinate information.
In one implementation, the salient region refers to: a circular area centered on the coordinate position indicated by the coordinate information; the processing unit 2102 is configured to perform region extraction processing on the training image based on the coordinate information, and when obtaining the attention image, specifically configured to:
acquiring display size information of a training image, and calculating a radiation radius according to the display size information;
determining a salient region in the training image according to the coordinate information and the radiation radius;
calculating the thermal value of each feature point in the salient region to generate an attention image; the thermal value of a feature point within a salient region in the attention image is greater than a thermal value threshold.
In one implementation, the processing unit 2102 is configured to redefine the category label in the feature map according to the attention map to obtain a first semantic label image, and is specifically configured to:
and replacing the label value of the category label of the corresponding feature point in the feature map by adopting the thermal value of each feature point in the salient region in the attention image to obtain a first semantic label image.
In an implementation manner, the processing unit 2102 is configured to invoke the initial semantic segmentation model to perform the second semantic segmentation on the training mind map in the training image, and when obtaining the second semantic label image, is specifically configured to:
calling a down-sampling module in the initial semantic segmentation model to perform down-sampling processing on the training image to obtain a first resolution image;
calling N-order convolutional layers in the initial semantic segmentation model, and performing feature extraction on the first resolution image to obtain N feature maps with different resolutions; n is an integer greater than 1;
calling a feature fusion module in the initial semantic segmentation model, and performing feature fusion processing on the N feature graphs to obtain a target feature graph;
and calling a classification module in the initial semantic segmentation model to classify the target feature map to obtain a second semantic label image.
In an implementation manner, the processing unit 2102 is configured to invoke a feature fusion module in the initial semantic segmentation model, perform feature fusion processing on the N feature maps, and when obtaining the target feature map, specifically configured to:
carrying out up-sampling processing on the feature map with the resolution smaller than the resolution threshold value in the N feature maps to obtain N feature maps with the same resolution;
and carrying out fusion processing on the N feature maps with the same resolution to obtain a target feature map.
According to an embodiment of the present application, the units in the mind map identifying apparatus shown in fig. 21 may be respectively or entirely combined into one or several other units to form the apparatus, or some unit(s) thereof may be further split into multiple functionally smaller units to form the apparatus, which may achieve the same operation without affecting the achievement of the technical effect of the embodiment of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the mind map identifying apparatus may also include other units, and in practical applications, these functions may also be implemented by the assistance of other units, and may be implemented by cooperation of a plurality of units. According to another embodiment of the present application, the mind map identifying apparatus as shown in fig. 21 may be constructed by running a computer program (including program code) capable of executing the steps involved in the respective methods shown in fig. 3, 9 and 16 on a general-purpose computing device such as a computer including a processing element and a storage element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM) and the like, and implementing the mind map device method of the embodiment of the present application. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.
In this embodiment, the processing unit 2102, in response to the restoring operation for the target mind map, may perform prediction processing on a root node of the target mind map in the target image, and display a predicted root node of the predicted target mind map in the target image; therefore, automatic prediction of the root node in the target thought-guide graph can be achieved, compared with the method of manually marking the root node, the method avoids the confirmation process of manually interfering the root node in the thought-guide graph, and reduces the workload of the user object. Further, the processing unit 2102, in response to the confirmation operation of the user object on the predicted root node, may continue to restore the complete target mind map based on the predicted root node; the method can be used for accurately predicting the target thinking guide graph based on the prediction, and improving the accuracy of the target thinking guide graph obtained by reduction. According to the scheme, in the process of restoring the target mind map, the complete target mind map can be quickly and accurately restored without manual intervention, and the restoring efficiency of the target mind map is improved.
Fig. 22 shows a schematic structural diagram of a computer device according to an exemplary embodiment of the present application. Referring to fig. 22, the computer device includes a processor 2201, a communication interface 2202, and a computer-readable storage medium 2203. The processor 2201, communication interface 2202, and computer-readable storage medium 2203 may be connected by a bus or other means, among others. Communication interface 2202, among other things, is used to receive and transmit data. The computer-readable storage medium 2203 may be stored in a memory of a computer device, the computer-readable storage medium 2203 being configured to store a computer program comprising program instructions, the processor 2201 being configured to execute the program instructions stored by the computer-readable storage medium 2203. The processor 2201 (or CPU) is a computing core and a control core of the computer device, and is adapted to implement one or more instructions, and specifically, adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function.
Embodiments of the present application also provide a computer-readable storage medium (Memory), which is a Memory device in a computer device and is used for storing programs and data. It is understood that the computer readable storage medium herein can include both built-in storage media in the computer device and, of course, extended storage media supported by the computer device. The computer readable storage medium provides a memory space that stores a processing system of the computer device. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by the processor 2201. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; optionally, at least one computer readable storage medium located remotely from the aforementioned processor is also possible.
In one embodiment, the computer-readable storage medium has one or more instructions stored therein; one or more instructions stored in the computer-readable storage medium are loaded and executed by the processor 2201 to implement the corresponding steps in the above-described mind map identification method embodiments; in particular implementations, one or more instructions in the computer-readable storage medium are loaded by the processor 2201 and perform the following steps:
acquiring a target image containing a target mind map;
displaying a predicted root node in the target mind map in the target image in response to a restore operation for the target mind map;
and responding to the confirmation operation of the prediction root node, and outputting the target mind map, wherein the target mind map is in an editable state.
In one implementation, one or more instructions in the computer-readable storage medium are loaded by the processor 2201 and when executing the method for displaying the predicted root node in the target mind map in the target image, the method specifically performs the following steps:
marking and displaying a prediction root node in a target thinking diagram in a target image;
the label display comprises the following steps: displaying the prediction root node in a visual display mode; or displaying the prediction root node in the form of a label identifier;
wherein, the label mark is displayed in the area where the prediction root node is positioned; or the label mark is displayed in the target image in an annotation form.
In one implementation, one or more instructions in the computer-readable storage medium are loaded by the processor 2201 and when executing the step of obtaining the target image containing the target mind map, the following steps are specifically performed:
displaying a function selection interface, wherein the function selection interface comprises an image-to-thought guide diagram option;
responding to the triggering of the image-to-thought guide picture option, and displaying an image acquisition interface;
and acquiring a target image containing the target mind map in the image acquisition interface.
In one implementation, the image acquisition interface includes an image upload option, and one or more instructions in the computer-readable storage medium are loaded by the processor 2201 and when executing the step of acquiring the target image including the target mind map in the image acquisition interface, the following steps are specifically executed:
in response to triggering the image upload option, displaying at least one candidate image;
any candidate image is selected from at least one candidate image, the selected candidate image is used as a target image, and the mind map contained in the candidate image is used as a target mind map contained in the target image.
In one implementation, the image acquisition interface includes an image scanning option, and one or more instructions in the computer-readable storage medium are loaded by the processor 2201 and when executing the steps of acquiring a target image containing a target mind map in the image acquisition interface, the following steps are specifically executed:
and responding to the triggering of the image scanning option, and performing scanning operation on a target image containing the target mind map to acquire the target image.
In one implementation, one or more instructions in the computer-readable storage medium are loaded by the processor 2201 and further perform the steps of:
in response to the operation of reselecting the root node in the target image, the marking display of the predicted root node is cancelled in the target image, and the newly reselected root node is marked and displayed in the target image;
outputting a target mind map in response to a validation operation on the new root node;
wherein the annotation display of the prediction root node is revoked for indicating: the prediction root node is not validated.
In one implementation, one or more instructions in the computer-readable storage medium are loaded by the processor 2201 and further perform the steps of:
updating the target mind map in response to an editing operation on the target mind map;
and responding to the sharing operation of the updated target mind map, and sharing the shared image containing the updated target mind map.
In one implementation, one or more instructions in a computer-readable storage medium are loaded by the processor 2201 and when performing the restoring operation for the target mind map, the following steps are specifically performed when displaying the predicted root node in the target mind map in the target image:
responding to the reduction operation aiming at the target thinking guide graph, calling a target semantic segmentation model to carry out root node prediction processing on the target image to obtain an initial prediction root node in the target thinking guide graph;
determining a node distance between the initial prediction root node and each node according to the initial prediction root node and each node in the target thinking map;
and taking the node with the shortest node distance to the initial prediction root node as the prediction root node.
In one implementation, one or more instructions in the computer-readable storage medium are loaded by the processor 2201 and when executing the calling of the target semantic segmentation model to perform root node prediction processing on the target image to obtain an initial predicted root node in the target mind map, the following steps are specifically performed:
calling a target semantic segmentation model to perform semantic segmentation processing on a target image to obtain a target semantic tag image corresponding to the target image, wherein the target semantic tag image comprises a foreground region;
taking a target connected domain of the foreground region as a salient region in a target image;
the central point of the salient region is used as an initial prediction root node in the target mind map.
In one implementation, one or more instructions in the computer-readable storage medium are loaded by the processor 2201 and further perform the steps of:
acquiring a training image, wherein the training image comprises a training thinking guide picture;
performing first semantic segmentation processing on a training thinking guide image in a training image to obtain a first semantic label image; and the number of the first and second groups,
calling the initial semantic segmentation model to perform second semantic segmentation processing on the training thinking guide graph in the training image to obtain a second semantic label image;
obtaining a loss function of the initial semantic segmentation model, and calculating a loss value of the loss function according to the first semantic label image and the second semantic label image;
if the loss value meets the preset condition, the initial semantic segmentation model reaches a convergence condition, and the initial semantic segmentation model is used as a trained target semantic segmentation model;
and if the loss value does not meet the preset condition, performing iterative training on the initial semantic segmentation model.
In one implementation, one or more instructions in the computer-readable storage medium are loaded by the processor 2201 and when performing the first semantic segmentation process on the training mind map in the training image to obtain the first semantic label image, the following steps are specifically performed:
performing region extraction processing on the training image to obtain an attention image, wherein the attention image comprises a salient region;
extracting features of the attention map image to obtain a feature map;
and redefining the class labels in the feature map according to the attention map to obtain a first semantic label image.
In one implementation, one or more instructions in the computer-readable storage medium are loaded by the processor 2201 and when performing the region extraction processing on the training image to obtain the attention image, the following steps are specifically performed:
acquiring a root node marking file corresponding to the training image, wherein the root node marking file comprises coordinate information of a root node in a preset training thinking guide graph;
performing region extraction processing on the training image based on the coordinate information to generate an attention image, wherein the attention image comprises: a salient region centered on the coordinate position indicated by the coordinate information.
In one implementation, the salient region refers to: a circular area centered on the coordinate position indicated by the coordinate information; one or more instructions in the computer-readable storage medium are loaded by the processor 2201, and when the area extraction processing is performed on the training image based on the coordinate information to obtain the attention image, the following steps are specifically performed:
acquiring display size information of a training image, and calculating a radiation radius according to the display size information;
determining a salient region in the training image according to the coordinate information and the radiation radius;
calculating the thermal value of each feature point in the salient region to generate an attention image; the thermal value of a feature point within a salient region in the attention image is greater than a thermal value threshold.
In one implementation, one or more instructions in a computer-readable storage medium are loaded by the processor 2201 and when performing an attention-mapping process to redefine class labels in a feature map to obtain a first semantic label image, the following steps are specifically performed:
and replacing the label value of the category label of the corresponding feature point in the feature map by adopting the thermal value of each feature point in the salient region in the attention image to obtain a first semantic label image.
In one implementation, when one or more instructions in the computer-readable storage medium are loaded by the processor 2201 and perform a second semantic segmentation process on the training mind map in the training image by invoking the initial semantic segmentation model to obtain a second semantic label image, the following steps are specifically performed:
calling a down-sampling module in the initial semantic segmentation model to perform down-sampling processing on the training image to obtain a first resolution image;
calling N-order convolutional layers in the initial semantic segmentation model, and performing feature extraction on the first resolution image to obtain N feature maps with different resolutions; n is an integer greater than 1;
calling a feature fusion module in the initial semantic segmentation model, and performing feature fusion processing on the N feature graphs to obtain a target feature graph;
and calling a classification module in the initial semantic segmentation model to classify the target feature map to obtain a second semantic label image.
In one implementation, when one or more instructions in the computer-readable storage medium are loaded by the processor 2201 and a feature fusion module in the initial semantic segmentation model is called to perform feature fusion processing on N feature maps to obtain a target feature map, the following steps are specifically performed:
carrying out up-sampling processing on the feature map with the resolution smaller than the resolution threshold value in the N feature maps to obtain N feature maps with the same resolution;
and carrying out fusion processing on the N feature maps with the same resolution to obtain a target feature map.
Based on the same inventive concept, the principle and the advantageous effect of solving the problem of the computer device provided in the embodiment of the present application are similar to the principle and the advantageous effect of solving the problem of the mind map identification method in the embodiment of the present application, and for brevity, the principle and the advantageous effect of the implementation of the method can be referred to, and are not described herein again.
Embodiments of the present application also provide a computer program product or a computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the mind map recognition method.
One of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the invention are all or partially effected when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., coaxial cable, fiber optic, digital line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. A computer-readable storage medium may be any available medium that can be accessed by a computer or a data processing device including one or more available media integrated servers, data centers, and the like. The available media may be magnetic media (e.g., floppy disks, hard disks, tapes), optical media (e.g., DVDs), or semiconductor media (e.g., Solid State Disks (SSDs)), among others.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present disclosure, and shall all fall within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (20)

1. A mind map recognition method, comprising:
acquiring a target image containing a target mind map;
displaying, in the target image, a predicted root node in the target mind map in response to a restore operation for the target mind map;
outputting the target mind map in an editable state in response to a confirmation operation on the predicted root node.
2. The method as recited in claim 1, wherein said displaying a predicted root node in said target mind map in said target image comprises:
marking and displaying a prediction root node in the target mind map in the target image;
the annotation display includes: displaying the predicted root node in a visual display; or, displaying the prediction root node in the form of a label identifier;
the label mark is displayed in the area where the prediction root node is located; or the annotation mark is displayed in the target image in an annotation form.
3. The method of claim 1, wherein said obtaining a target image containing a target mind map comprises:
displaying a function selection interface, wherein the function selection interface comprises an image-to-thought guide diagram option;
responding to the triggering of the image-to-thought guide picture option, and displaying an image acquisition interface;
and acquiring a target image containing a target mind map in the image acquisition interface.
4. The method of claim 3, wherein the image acquisition interface includes an image upload option, said acquiring a target image containing a target mind map in the image acquisition interface comprising:
in response to triggering the image upload option, displaying at least one candidate image;
selecting any candidate image from the at least one candidate image, using the selected candidate image as a target image, and using a mind map included in the candidate image as a target mind map included in the target image.
5. The method of claim 3, wherein the image acquisition interface includes an image scanning option, said acquiring a target image containing a target mind map in the image acquisition interface comprising:
and responding to the triggering of the image scanning option, and performing scanning operation on a target image containing a target thinking map to acquire the target image.
6. The method of claim 2, wherein the method further comprises:
in response to the operation of reselecting a root node in the target image, the annotation display of the predicted root node is cancelled in the target image, and the newly reselected root node is displayed in an annotation manner in the target image;
outputting a target mind map in response to a validation operation on the new root node;
wherein the annotation display of the prediction root node is revoked to represent: the prediction root node is not validated.
7. The method as recited in claim 1, wherein said outputting said target mind map further comprises:
updating the target mind map in response to an editing operation on the target mind map;
and responding to the sharing operation of the updated target mind map, and sharing the shared image containing the updated target mind map.
8. The method as recited in claim 1, wherein said displaying a predicted root node in said target mind map in said target image in response to a restore operation for said target mind map comprises:
responding to the reduction operation aiming at the target thinking map, calling a target semantic segmentation model to carry out root node prediction processing on the target image to obtain an initial prediction root node in the target thinking map;
determining a node distance between the initial prediction root node and each node in the target mind map according to the initial prediction root node and each node;
and taking the node with the shortest node distance with the initial prediction root node as the prediction root node.
9. The method of claim 8, wherein said invoking the target semantic segmentation model to perform root node prediction processing on the target image resulting in an initial predicted root node in the target mind map comprises:
calling a target semantic segmentation model to perform semantic segmentation processing on the target image to obtain a target semantic label image corresponding to the target image, wherein the target semantic label image comprises a foreground region;
taking a target connected domain of the foreground region as a salient region in the target image;
and taking the central point of the salient region as an initial prediction root node in the target mind map.
10. The method of claim 8, wherein the method further comprises:
acquiring a training image, wherein the training image comprises a training thinking guide graph;
performing first semantic segmentation processing on the training thinking map in the training image to obtain a first semantic label image; and the number of the first and second groups,
calling an initial semantic segmentation model to perform second semantic segmentation processing on the training thinking guide graph in the training image to obtain a second semantic label image;
obtaining a loss function of the initial semantic segmentation model, and calculating a loss value of the loss function according to the first semantic label image and the second semantic label image;
if the loss value meets a preset condition, the initial semantic segmentation model reaches a convergence condition, and the initial semantic segmentation model is used as a trained target semantic segmentation model;
and if the loss value does not meet the preset condition, performing iterative training on the initial semantic segmentation model.
11. The method of claim 10, wherein said performing a first semantic segmentation process on said training mind map in said training image to obtain a first semantic label image comprises:
performing region extraction processing on the training image to obtain an attention image, wherein the attention image comprises a salient region;
extracting features of the attention map image to obtain a feature map;
and redefining the class labels in the feature map according to the attention map image to obtain a first semantic label image.
12. The method of claim 11, wherein said performing region extraction processing on said training image to obtain an attention image comprises:
acquiring a root node marking file corresponding to the training image, wherein the root node marking file comprises preset coordinate information of a root node in the training thinking guide graph;
performing region extraction processing on the training image based on the coordinate information to generate an attention image, wherein the attention image comprises: a saliency region centered on a coordinate position indicated by the coordinate information.
13. The method of claim 12, wherein the salient region is: a circular area centered on a coordinate position indicated by the coordinate information; the performing region extraction processing on the training image based on the coordinate information to obtain the attention image includes:
acquiring display size information of the training image, and calculating a radiation radius according to the display size information;
determining a salient region in the training image according to the coordinate information and the radiation radius;
calculating the thermal value of each feature point in the salient region to generate an attention image; the thermal value of a feature point within the salient region in the attention image is greater than a thermal value threshold.
14. The method of claim 13, wherein redefining the class labels in the feature map based on the attention map image to obtain a first semantic label image comprises:
and replacing the label value of the category label of the corresponding feature point in the feature map by the thermal value of each feature point in the salient region in the attention image to obtain a first semantic label image.
15. The method of claim 10, wherein said invoking an initial semantic segmentation model to perform a second semantic segmentation process on the training mind map in the training image to obtain a second semantic tagged image comprises:
calling a down-sampling module in the initial semantic segmentation model to perform down-sampling processing on the training image to obtain a first resolution image;
calling N-order convolution layers in the initial semantic segmentation model, and performing feature extraction on the first resolution image to obtain N feature maps with different resolutions; n is an integer greater than 1;
calling a feature fusion module in the initial semantic segmentation model, and performing feature fusion processing on the N feature graphs to obtain a target feature graph;
and calling a classification module in the initial semantic segmentation model to classify the target feature map to obtain a second semantic label image.
16. The method according to claim 15, wherein said invoking a feature fusion module in the initial semantic segmentation model to perform feature fusion processing on the N feature maps to obtain a target feature map comprises:
performing up-sampling processing on feature maps with the resolution smaller than a resolution threshold value in the N feature maps to obtain N feature maps with the same resolution;
and carrying out fusion processing on the N feature maps with the same resolution to obtain a target feature map.
17. An apparatus for recognizing a mind map, comprising:
an acquisition unit configured to acquire a target image including a target mind map;
a processing unit for displaying a predicted root node in the target mind map in the target image in response to a restore operation for the target mind map;
the processing unit is further configured to output the target mind map in response to a confirmation operation on the prediction root node, where the target mind map is in an editable state.
18. A computer device, comprising:
a processor adapted to execute a computer program;
a computer-readable storage medium, in which a computer program is stored which, when executed by the processor, implements the mind map recognition method according to any one of claims 1-16.
19. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program adapted to be loaded by a processor and to perform the mind map recognition method according to any one of claims 1-16.
20. A computer program product, characterized in that the computer program product comprises computer instructions which, when executed by a processor, implement the mind map recognition method according to any one of claims 1-16.
CN202210421961.4A 2022-04-21 2022-04-21 Thinking guide graph recognition method, device, equipment, medium and program product Pending CN115115740A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210421961.4A CN115115740A (en) 2022-04-21 2022-04-21 Thinking guide graph recognition method, device, equipment, medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210421961.4A CN115115740A (en) 2022-04-21 2022-04-21 Thinking guide graph recognition method, device, equipment, medium and program product

Publications (1)

Publication Number Publication Date
CN115115740A true CN115115740A (en) 2022-09-27

Family

ID=83324735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210421961.4A Pending CN115115740A (en) 2022-04-21 2022-04-21 Thinking guide graph recognition method, device, equipment, medium and program product

Country Status (1)

Country Link
CN (1) CN115115740A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117035067A (en) * 2023-10-07 2023-11-10 腾讯科技(深圳)有限公司 Thinking guide drawing rendering method and device and electronic equipment
CN117151442A (en) * 2023-10-31 2023-12-01 中国医学科学院医学信息研究所 Scientific data management plan generation method based on thought guide graph

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117035067A (en) * 2023-10-07 2023-11-10 腾讯科技(深圳)有限公司 Thinking guide drawing rendering method and device and electronic equipment
CN117035067B (en) * 2023-10-07 2024-01-23 腾讯科技(深圳)有限公司 Thinking guide drawing rendering method and device and electronic equipment
CN117151442A (en) * 2023-10-31 2023-12-01 中国医学科学院医学信息研究所 Scientific data management plan generation method based on thought guide graph
CN117151442B (en) * 2023-10-31 2024-01-23 中国医学科学院医学信息研究所 Population health field scientific data management generation method based on mind map

Similar Documents

Publication Publication Date Title
CN112101357B (en) RPA robot intelligent element positioning and picking method and system
JP7084457B2 (en) Image generation methods, generators, electronic devices, computer-readable media and computer programs
CN105320428B (en) Method and apparatus for providing image
EP3843004A1 (en) Portrait segmentation method, model training method and electronic device
CN110379020B (en) Laser point cloud coloring method and device based on generation countermeasure network
CN115115740A (en) Thinking guide graph recognition method, device, equipment, medium and program product
CN116382554A (en) Improved drag and drop operations on mobile devices
CN107992937B (en) Unstructured data judgment method and device based on deep learning
CN116049397B (en) Sensitive information discovery and automatic classification method based on multi-mode fusion
KR20200059993A (en) Apparatus and method for generating conti for webtoon
CN113469294B (en) Method and system for detecting icons in RPA robot
CN113411550B (en) Video coloring method, device, equipment and storage medium
CN112149642A (en) Text image recognition method and device
CN111741329B (en) Video processing method, device, equipment and storage medium
US20210012503A1 (en) Apparatus and method for generating image
CN112383824A (en) Video advertisement filtering method, device and storage medium
KR102388777B1 (en) System for providing adjacent building pre-survey service usign 360 degree virtual reality camera
CN113849575B (en) Data processing method, device and system
WO2024041235A1 (en) Image processing method and apparatus, device, storage medium and program product
JP6914724B2 (en) Information processing equipment, information processing methods and programs
CN112560925A (en) Complex scene target detection data set construction method and system
CN107656760A (en) Data processing method and device, electronic equipment
CN112070852A (en) Image generation method and system, and data processing method
CN112165626B (en) Image processing method, resource acquisition method, related equipment and medium
CN111291758B (en) Method and device for recognizing seal characters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination