WO2024021321A1

WO2024021321A1 - Model generation method and apparatus, electronic device, and storage medium

Info

Publication number: WO2024021321A1
Application number: PCT/CN2022/126426
Authority: WO
Inventors: 李睿宇; 石康; 原卉
Original assignee: 深圳思谋信息科技有限公司
Priority date: 2022-07-26
Filing date: 2022-10-20
Publication date: 2024-02-01
Also published as: CN114943976A; CN114943976B

Abstract

Embodiments of the present application provide a model generation method and apparatus, an electronic device, and a storage medium. The method comprises: acquiring a sample image data set; and determining a model of a tree structure according to the sample image data set, the tree structure comprising N layers, each of the N layers comprising at least one node, the input of the node of the first layer being the sample image data set, and the input of the node of the i-th layer being the output of one of the nodes in the (i-1)-th layer, wherein N is a positive integer greater than 1, i = 2, ..., N. Each node is one of the following controls: an image segmentation function control, an image classification function control, an image target detection function control and an optical character recognition function control. According to the method, on a user interface, a user can visually and conveniently select different functional modules to process data, the flexibility of model generation is improved, and a relatively complex data processing scene can be dealt with conveniently.

Description

Model generation method, device, electronic device and storage medium

This application claims priority to the Chinese patent application filed with the China Patent Office on July 26, 2022, with the application number 202210881785.2 and the application title "Method, device, electronic device and storage medium for model generation", the entire content of which is incorporated by reference. incorporated in this application.

Technical field

The embodiments of the present application relate to the field of artificial intelligence technology, and in particular, to a method, device, electronic device, and storage medium for model generation.

Background technique

At present, artificial intelligence (AI) has received widespread attention from academia and industry. AI technology has been widely used due to its faster processing speed and higher processing accuracy. For example, face recognition, image classification, object detection, speech recognition, etc.

Before applying an AI model, you first need to build or select a suitable AI model, as well as train and optimize the AI model. Currently, AI development platforms are commonly used to provide users with services such as selection, construction, verification, and optimization of AI models for certain task goals.

However, AI development platforms are less flexible. How to obtain a model with diverse functions and the requirements for data processing in complex scenarios are issues that need to be solved urgently.

Contents of the invention

In the first aspect, a method of model generation is provided to obtain a sample image data set; a tree structure model is determined based on the sample image data set. The tree structure includes N layers, and each layer in the N layers Including at least one node, where the input of the node in the first layer is the sample image data set, and the input of the node in the i-th layer is the output of one of the nodes in the i-1 layer, where N is a positive integer greater than 1, i =2,...,N; each node is one of the following controls: image segmentation function control, image classification function control, image target detection function control, optical character recognition function control.

It should be understood that the embodiments of the present application do not limit the specific number of sample image data sets. The types of nodes (function controls) in the tree-structured model can have at most the following four items: image segmentation function controls, image classification function controls, image target detection function controls, or optical character recognition function controls. The tree structure model can contain one or more of the above four functional controls, and the same functional control can also have one or more. This can provide richer node types. Users can select all or part of the functions in the model for output based on the actual situation. Any node in each layer of the tree structure model is only connected to one node in the upper layer of the node, and will not be connected to all nodes in the upper layer. Moreover, there is no connection between the nodes in each layer, and the input of each node is the output of the node in the previous layer connected to the node.

In this embodiment of the present application, a method for generating a tree-structured model is provided. The input of each node in the model is the output of the node in the previous layer connected to the node. Users can select the functions corresponding to different nodes (function controls) according to their needs to process the sample image data set. Moreover, the four node types that may be included in the model are more suitable for application in industrial scenarios, which to a certain extent simplifies the process of model generation, improves the flexibility of model generation, and can cope with more complex data processing scenarios while reducing It reduces the difficulty for users to learn and use.

In connection with the first aspect, in some implementations of the first aspect, determining the model of the tree structure based on the sample image data set includes: in response to the user's first operation, displaying the k-th layer of the tree structure. Control options; in response to the second operation of the user, determine the node of the k-th layer from the control options of the k-th layer; display the sample processing method for the sample image data set according to the determined node of the k-th layer. Tool; use the sample image data set after the sample processing to train the tree structure model, where k=1,2,...,N.

It should be understood that in response to the user's first operation, control options of a certain layer of the tree structure are displayed, and the control options include the above four functional controls (node types). Through the user's second operation, one or more of the four function controls are selected. Furthermore, a tool for performing sample processing on the sample image data set is displayed, and the user can use the sample processing tool to perform some auxiliary operations on the sample image data set.

Combined with the first aspect, in some implementations of the first aspect, the tree-structured model includes a first sub-model, and the first sub-model includes an m-th layer and an m+1-th layer, where the m-th layer The node of is the image segmentation function control, and the node of the m+1th layer is the image target detection function control, where m is a positive integer less than N.

It should be understood that a tree-structured model may include multiple sub-models. The first submodel is one of multiple submodels. Any functional control in the tree-structured model can be regarded as a sub-model. Exemplarily, the first sub-model includes an image segmentation function control and an image target detection function control. That is, the first submodel includes two functional controls in series. Of course, this application does not limit the specific functional controls included in the first sub-model, and the specific functional controls can be determined according to the actual needs of the user.

In the embodiment of the present application, the tree-structured model generation method can be intuitively displayed on the user interface, which can facilitate the user to perform the model generation process and reduce the difficulty of user learning.

Combined with the first aspect, in some implementations of the first aspect, the tree-structured model includes a second sub-model, and the second sub-model includes the m-th layer, the m+1-th layer and the m+2-th layer, Among them, the node on the mth layer is the image segmentation function control, the node on the m+1th layer is the image target detection function control, and the node on the m+2th layer is the optical character recognition function control, where m is A positive integer less than N.

It should be understood that the image target detection function control in the first sub-model is subsequently added with an optical character recognition function control, and the second sub-model includes the first sub-model and the newly added optical character recognition function control. Exemplarily, the second sub-model includes image segmentation function controls, image target detection function controls, and optical character recognition function controls. Of course, this application does not limit the specific functional controls included in the second sub-model, and the specific functional controls can be determined according to the actual needs of the user. It can be understood that the second sub-model is connected in series with the first sub-model. According to the actual situation, the output data of the first sub-model or the output data of the second sub-model can be selected as the final output data, or any function control can be selected. The output data is used as the final output data.

Combined with the first aspect, in some implementations of the first aspect, the tree-structured model includes a third sub-model, and the third sub-model includes the j-th layer and the j+1-th layer, where the j-th layer The node of is the optical character recognition control, and the node of the j+1th layer is the image classification control, where j is a positive integer less than N.

It should be understood that the third sub-model is connected in parallel with the first sub-model (or second sub-model). Exemplarily, the third sub-model includes an optical character recognition function control and an image classification function control. Of course, this application does not limit the specific functional controls included in the third sub-model, and the specific functional controls can be determined according to the actual needs of the user.

Optionally, users can modify the tree-structured model in real time according to actual needs. For example, when the user performs a deletion operation on the image target detection function control in the second sub-model, the image target detection function control and subsequent function controls can be deleted. In other words, the user can delete the corresponding function control and any function control after the function control through one deletion operation. It can improve the flexibility of model generation.

Combined with the first aspect, in some implementations of the first aspect, the sample processing is one of the following processing: marking processing, sample selection processing, scaling processing, and preprocessing.

Combined with the first aspect, in some implementations of the first aspect, the function corresponding to the image segmentation function control is a segmentation algorithm based on deep learning.

In a second aspect, an electronic device is provided, including: one or more processors; one or more memories; the one or more memories store one or more computer programs, and the one or more computer programs include instructions , when the instruction is executed by the one or more processors, the electronic device is caused to execute the first aspect and the method in any possible implementation manner of the first aspect.

A third aspect provides a device for model generation, a processor coupled to a memory, the memory is used to store a computer program, and the processor is used to run the computer program, so that the device for model generation performs the above-mentioned first aspect and method in any of its possible implementations.

In conjunction with the third aspect, in some implementations of the third aspect, the device for model generation further includes one or more of the memory and a transceiver, the transceiver being used to receive signals and/or transmit signals.

In a fourth aspect, a computer-readable storage medium is provided. The computer-readable storage medium includes a computer program or instructions. When the computer program or instructions are run on a computer, it makes it possible as in the first aspect and any of the instructions thereof. The methods in the implementation are executed.

In a fifth aspect, a computer program product is provided. The computer program product includes a computer program or instructions. When the computer program or instructions are run on a computer, the computer program or instructions perform as in the first aspect and any possible implementation manner thereof. The method is executed.

A sixth aspect provides a computer program that, when run on a computer, causes the method in the first aspect and any possible implementation thereof to be executed.

Description of drawings

Figure 1 is a schematic diagram of the system architecture provided by an embodiment of the present application.

Figure 2 is a framework diagram of the model generation method provided by the embodiment of the present application.

Figure 3 is a schematic diagram of a model generation method provided by an embodiment of the present application.

Figure 4 is a schematic diagram of a functional module provided by an embodiment of the present application.

Figure 5 is a processing flow chart of a data set provided by an embodiment of the present application.

Figure 6 is a schematic diagram of image segmentation provided by an embodiment of the present application.

Figure 7 is a processing flow chart of another data set provided by the embodiment of the present application.

Figure 8 is a processing flow chart of another data set provided by the embodiment of the present application.

Figure 9 is a schematic diagram of the data set processing flow provided by the embodiment of this application.

Figure 10 is a schematic diagram of another data set processing flow provided by an embodiment of the present application.

Figure 11 is a schematic diagram of another data set processing flow provided by an embodiment of the present application.

Figure 12 is a schematic diagram of another data set processing flow provided by an embodiment of the present application.

Figure 13 is a schematic diagram of another data set processing flow provided by an embodiment of the present application.

Figure 14 is a schematic diagram of another data set processing flow provided by an embodiment of the present application.

Figure 15 is a schematic diagram of another data set processing flow provided by an embodiment of the present application.

Figure 16 is a schematic flow chart of a model generation method provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

The terminology used in the following examples is for the purpose of describing specific embodiments only and is not intended to limit the application. As used in the specification and appended claims of this application, the singular expressions "a", "an", "the", "above", "the" and "the" are intended to also include For example, the expression "one or more" unless the context clearly indicates otherwise. It should also be understood that the term "and/or" is used to describe the relationship between associated objects, indicating that there can be three relationships; for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and they exist alone. In the case of B, A and B can be singular or plural. The character "/" generally indicates that the related objects are in an "or" relationship.

Reference in this specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Therefore, the phrases "in one embodiment", "in some embodiments", "in other embodiments", "in other embodiments", etc. appearing in different places in this specification are not necessarily References are made to the same embodiment, but rather to "one or more but not all embodiments" unless specifically stated otherwise. The terms “including,” “includes,” “having,” and variations thereof all mean “including but not limited to,” unless otherwise specifically emphasized.

The model generation method provided by this application can be applied to the system architecture diagram shown in Figure 1. Among them, the terminal device 11 communicates with the server 12 through the network. The terminal device 11 sends the sample pattern data set and the user's intention to the server 12 . Among them, user intention is used to represent the user's required processing of the sample pattern data set. The server 12 uses the sample pattern data set to train the corresponding model according to the user's intention, and finally generates the model actually needed by the user. That is to say, the user first uploads the data set that needs to be processed to the AI development platform provided by the embodiment of this application, and then selects the corresponding processing operation according to the actual needs to generate the model.

It should be understood that the terminal device 12 in the above system architecture diagram includes but is not limited to mobile phones, tablet computers, wearable electronic devices with wireless communication functions (such as smart watches), etc. Exemplary embodiments of portable electronic devices include, but are not limited to, carrying

Or portable electronic devices with other operating systems. It should also be understood that in some other embodiments, the above-mentioned electronic device may not be a portable electronic device, but a desktop computer. The server 12 can be implemented as an independent server or a server cluster composed of multiple servers.

As shown in Figure 2, a framework diagram of a model generation method provided by an embodiment of the present application is shown. As shown in Figure 2, the model generation framework diagram 200 includes an image segmentation module 210, an image classification module 220, an image target detection module 230, and an optical character recognition (optical character recognition, OCR) 240. Below is a detailed introduction to each of the above modules.

Image segmentation module 210 may be used to perform image segmentation. Image segmentation is a technique and process that divides an image into several specific regions with unique properties and proposes objects of interest. It is a key step from image processing to image analysis. Existing image segmentation methods are mainly divided into the following categories: threshold-based segmentation methods, region-based segmentation methods, edge-based segmentation methods, and segmentation methods based on specific theories. In the embodiment of this application, the segmentation algorithm based on deep learning is mainly used. From a mathematical perspective, image segmentation is the process of dividing a digital image into disjoint regions. The process of image segmentation is also a labeling process, which can assign the same number to pixels belonging to the same area. Image segmentation can be applied to the detection and edge recognition of detected objects down to the pixel level. For example, it can identify defects in fine parts such as cracked areas on silicon wafers and damaged areas in bearings.

Image classification module 220 may be used to perform image classification. Image classification is an image processing method that distinguishes different categories of targets based on the different characteristics reflected in the image information. Image classification can be applied to classify and judge detected materials. For example, two classification judgments are made based on whether materials are qualified, the color of the test object, the type of food being tested, defect subdivision, or classifying the test objects according to different materials.

The image object detection module 230 may be used to perform image object detection. Image target detection is a theory and method that utilizes the fields of image processing and pattern recognition. Objects of interest can be located from the image, the specific category of each object is determined, and the bounding box of each object is given. Image target detection can be used to locate and classify targets in detection materials, and is suitable for multi-target detection, small target detection or counting, etc. For example, determine the number of pharmaceutical pills, determine the location of defects in components, etc.

The optical character recognition module 240 may be used to perform optical character recognition. Optical character recognition refers to the process of analyzing and recognizing image files of text materials to obtain text and layout information. Optical character recognition can be applied to single-character labeling and recognition, and multi-character labeling and recognition. It can break the limitations of traditional methods and solve complex character recognition problems such as curve character recognition, low contrast character recognition, and large character recognition. For example, text on fine parts can be recognized.

Figure 3 shows a schematic diagram of a model generation method provided by an embodiment of the present application. As shown in the figure, the data set 300 includes but is not limited to a picture data set, a video data set, a text data set, etc. The structure generated by this model is a tree structure. The user can select modules with different functions for processing the data set 300 according to actual needs, such as image segmentation module, image classification module, image target detection module or optical character recognition module, etc. In Figure 3, only the above four modules are taken as examples for explanation. The user can select the required modules to build tree branches to implement the next step of processing the data set 300.

As an example, a model generating three series or parallel schemes is shown in Figure 3 . First, the data set 300 is subjected to image segmentation, image target detection, optical character recognition and image classification processing in sequence. Second, the data set 300 is subjected to image segmentation, image target detection and optical character recognition processing in sequence. Third, only image classification processing is performed on the data set 300.

Take the first method above as an example. After the image segmentation process is performed on the data set 300, the obtained segmentation data will be transmitted to the image object detection module. In other words, the input data of each module is the output data of the previous module.

Optionally, users can freely combine and connect the above four modules according to actual needs. However, other functional modules will not be connected after image classification.

Optionally, after the tree structure shown in Figure 3 is constructed, the user can selectively generate a model corresponding to the tree structure. For example, output the software development kit (SDK) corresponding to the model, etc.

It should be understood that if the tree structure diagram contains N child nodes, there will eventually be N outputs. Each sub-node in the data structure diagram corresponds to a complete solution. However, users can choose to output all or part of the solution according to actual needs.

For example, the user can select or delete functional modules through an input device (such as a mouse). As shown in Figure 4, when the user clicks the data set 300 with the left mouse button, four functional modules, including image segmentation, image classification, image target detection, and optical character recognition, will appear for the user to choose. When the user clicks image segmentation with the left mouse button, the image segmentation process can be performed on the data set 300. When the user clicks on the image classification through the mouse, the image classification process can be performed on the data set 300. When the user clicks on the image object detection through the mouse, the image object detection process can be performed on the data set 300. When the user clicks on the optical character recognition through the mouse, the optical character recognition process can be performed on the data set 300.

Of course, when the user clicks on image segmentation, the above four functional modules will also appear for the user to choose the specific task to be performed next. That is to say, after each processing, the user can choose a specific processing method for the next step. The specific processing method can be one or more, and this application does not limit this.

For example, when the user right-clicks the image segmentation module, it means deleting the image segmentation module. When there are other functional modules after the image segmentation module, the user clicks the right mouse button. Click the image segmentation module, and other functional modules after the image segmentation module will also be deleted. For example, as shown in Figure 3, when the user right-clicks the image classification module 320, it means that the image target detection module 320, the optical character recognition modules 330 and 360, and the image classification module 340 are all deleted. The image segmentation module 310 and image classification module 350 in Figure 3 will be retained.

It should be understood that the above-mentioned manner in which a user can select or delete functional modules through an input device (such as a mouse) is only an example. In the embodiment of this application, operations such as selecting, deleting or changing different modules can also be implemented by dragging function modules, setting controls and other methods.

For ease of understanding, the processing flow of the data set 300 in Figure 3 will be described in detail below, taking the data set 300 uploaded by the user including 100 images as an example.

As shown in Figure 5, the processing flow of a data set in Figure 3 is shown. Specific steps are as follows:

S501, image segmentation.

It should be understood that the 100 images in the data set 300 are segmented. To facilitate understanding, FIG. 6 is an example of image data provided by this application. As shown in Figure 6, the segmented image 600 may include four

regions

610, 620, 630, and 640. The four areas in Figure 6 are only examples, and other areas may also be included. Image segmentation can assign the same number to pixels belonging to the same area.

S502, image target detection.

It should be understood that the target of interest is located from the segmented areas of the 100 images in step S501, the specific category of each target is determined, and the bounding box of each target is given. For example, the target may be a trademark logo, or a description label, etc.

S503, optical character recognition.

It should be understood that text recognition analysis is performed on the target located in step S502.

S504, image classification.

It should be understood that the 100 images in step S503 are classified. For example, the area numbered A in Figure 6 is divided into one category, the area numbered B is divided into another category, the area numbered C is divided into another category, and the area numbered D is divided into another category.

Through the above steps S501 to S504, a model can be generated, which can be used to process certain images according to the process of image segmentation, image target detection, optical character recognition and image classification.

As shown in Figure 7, the processing flow of another data set in Figure 3 is shown. Specific steps are as follows:

S701, image segmentation.

It should be understood that the 100 images in the data set 300 are segmented. Image segmentation can assign the same number to pixels belonging to the same area.

S702, image target detection.

It should be understood that the target of interest is located from the segmented areas of the 100 images in step S701, the specific category of each target is determined, and the bounding box of each target is given. For example, the target may be a trademark logo, or a description label, etc.

S703, optical character recognition.

It should be understood that text recognition analysis is performed on the target data located in step S702.

Through the above steps S701 to S703, a model can be generated, which can be used to process images according to the process of image segmentation, image target detection and optical character recognition.

As shown in Figure 8, the processing flow of another data set in Figure 3 is shown. Specific steps are as follows:

S801, image classification.

According to the different features reflected in the image information, the 100 images in the data set 300 are separated into different categories of areas in the image, and marked with different numbers.

Through the above step S801, a model is generated, which can directly perform image classification processing on images.

For ease of understanding, taking finding defects on components as an example, Figures 9, 10, 11 and 12 show schematic diagrams of data sets being processed in a tree structure model.

As shown in Figure 9, the processing flow of the tree structure model is shown. The processing flow includes: on the one hand, the data set undergoes image target detection (for example, detection 1 in Figure 9), and then undergoes optical character recognition (for example, OCR1 in Figure 9) processing. On the other hand, the data set is processed by image object detection (eg, detection 1 in Figure 9) and then image classification (eg, classification 1 in Figure 9). Among them, detection 1, OCR1 and classification 1 in Figure 9 can be considered as different functional controls.

As shown in FIG. 10 , a schematic diagram of detection 1 in the above-mentioned aspect of the processing flow is shown. The image target detection corresponding to detection 1 is used to locate a specific area in the image. For example, the area above the characters "9BC3" is currently set.

As shown in FIG. 11 , a schematic diagram of OCR1 in the above-mentioned processing flow is shown. Perform character recognition on the area located after executing detection 1. As can be seen from the figure, OCR1 can also perform post-processing on the results output by detection 1. The post-processing can be to offset the determined area by a fixed amount so that the character "9BC3" area in the figure is located.

It should be understood that the above post-processing can be the result of 4 types of processing on the sample image data set, and further adjustments can be made. For example, the area corresponding to the processed output result is offset or scaled by a certain amount to make the area corresponding to the output result more accurate.

For example, after the sample image data set undergoes image segmentation processing, the image is segmented into specific areas, and the user interface may display the segmented specific areas. Users can visually check whether the sample image data set is segmented accurately. When the specific segmented area deviates from the area that the user needs to be segmented, the specific area corresponding to the image segmentation process can be further adjusted to make the process more accurate.

For example, after the sample image data set is subjected to image classification, the images are classified into different areas. According to the preset classification basis, the classification of the sample image data set may be inaccurate, and areas that should not be in the same category are classified into the same category. Users can adjust the image classification by further moving the area corresponding to the output result, making the classification of the sample image data set more accurate.

For example, after the sample image data set undergoes image object detection, objects of interest to the user are marked. For example, count the number of objects of interest to the user in the sample image data. When there are targets that are not counted by the image target detection process, the user can further adjust and manually mark the targets that are not counted to improve the accuracy of quantity statistics.

For example, after the sample image data set undergoes optical character recognition, all characters on the current image are marked. But in fact, the user may only be interested in the numeric characters on the current image, and can further zoom in on the area after optical character recognition, and choose to recognize only the numeric characters on the image.

It should be understood that users can further process the data obtained after the above four types of processing according to actual needs. Further processing of the sample image data set includes but is not limited to labeling processing, sample selection processing, scaling processing, and preprocessing. Further processing can be understood as the process of optimizing the processed data, which allows the final trained model to process data more accurately.

As shown in FIG. 12 , a schematic diagram of Classification 1 in the processing flow of the above-mentioned aspect is shown. Classify the areas located after performing detection 1. As can be seen from the figure, there are two types: single code and multi-code. Among them, "9BC3" in the character area in Figure 11 is a single code, and the two "9BD4" above and below the character area in Figure 14 are multi-codes.

For ease of understanding, taking finding defects on components as an example, Figures 13, 14 and 15 show schematic diagrams of data sets being processed in another tree structure model.

As shown in Figure 13, the processing flow of the tree structure model is shown. The processing flow includes: the data set undergoes image target detection (for example, detection 1 in Figure 13), and then image segmentation (for example, segmentation 1 in Figure 13). Among them, detection 1 and segmentation 1 in Figure 13 can be considered as different functional controls.

As shown in FIG. 14 , a schematic diagram of detection 1 in the above process flow is shown. The image target detection corresponding to detection 1 is used to locate a specific area in the image. For example, what is currently set is the position of the diode in the image.

As shown in FIG. 15 , a schematic diagram of segmentation 1 in the above processing flow is shown. Segmentation 1: For the positioned diode, the defective area is further identified and the defective area will be segmented.

Figure 16 shows a model generation method 1600 provided by the embodiment of the present application. This method can be applied in the framework shown in Figure 2. The method 1600 is described in detail below.

S1601, obtain the sample image data set.

It should be understood that the embodiments of the present application do not limit the specific number of sample image data sets.

S1602. Determine a tree structure model based on the sample image data set.

It should be understood that the tree structure includes N layers, and each layer in the N layers includes at least one node, wherein the input of the node of the first layer is the sample image data set, and the input of the node of the i-th layer is the i-th layer. The output of one of the nodes in layer 1, where N is a positive integer greater than 1, i=2,...,N; each node is one of the following controls: image segmentation function control, image classification function control, image target detection function Controls, optical character recognition function controls.

The embodiments of this application do not limit the specific number of sample image data sets. The nodes (function controls) in the model of the tree structure can include at most four types of function controls: image segmentation function control, image classification function control, image target detection function control or optical character recognition function control. The tree structure model can contain one or more of the above four functional controls, and the same functional control can also have one or more, which can provide a richer node type. Users can select all or part of the functions in the model for output based on the actual situation. Any node in each layer of the tree structure model is only connected to one node in the upper layer of the node, and will not be connected to all nodes in the upper layer. Moreover, there is no connection between the nodes in each layer, and the input of each node is the output of the node in the previous layer connected to the node.

In some embodiments, in response to the user's first operation, the control options of the kth layer of the tree structure are displayed; in response to the user's second operation, the node of the kth layer is determined from the control options of the kth layer; according to The determined node of the kth layer displays the tool for sample processing of the sample image data set; the sample image data set after sample processing is used to train a tree-structured model, where k=1,2,...,N .

In some embodiments, the tree-structured model includes a first sub-model, and the first sub-model includes an m-th layer and an m+1-th layer, wherein the nodes of the m-th layer are image segmentation function controls, and the m+1-th layer The node is the image target detection function control, where m is a positive integer less than N.

In some embodiments, the tree-structured model includes a second sub-model, and the second sub-model includes the m-th layer, the m+1-th layer, and the m+2-th layer, wherein the nodes of the m-th layer are image segmentation function controls. , the node on the m+1th layer is the image target detection function control, and the node on the m+2th layer is the optical character recognition function control, where m is a positive integer less than N.

It should be understood that the image target detection function control in the first sub-model was later added with an optical character recognition function control, and the second sub-model includes the first sub-model and the newly added optical character recognition function control. Exemplarily, the second sub-model includes image segmentation function controls, image target detection function controls, and optical character recognition function controls. Of course, this application does not limit the specific functional controls included in the second sub-model, and the specific functional controls can be determined according to the actual needs of the user. It can be understood that the second sub-model is connected in series with the first sub-model. According to the actual situation, the output data of the first sub-model or the output data of the second sub-model can be selected as the final output data, or any function control can be selected. Output data as final output data.

In some embodiments, the tree-structured model includes a third sub-model, and the third sub-model includes the j-th layer and the j+1-th layer, where the node on the j-th layer is an optical character recognition control, and the j+1-th layer The node is the image classification control, where j is a positive integer less than N.

In some embodiments, the sample processing is one of: labeling processing, sample selection processing, scaling processing, preprocessing.

In some embodiments, the function corresponding to the image segmentation function control is a segmentation algorithm based on deep learning.

In this embodiment of the present application, a model generation method based on a tree structure is provided. Users can choose different functional modules to process data according to specific needs, and can also choose multiple branch functions for the next step of processing the processed data. This method improves the flexibility of model generation and can cope with more complex data processing scenarios. At the same time, users can intuitively and conveniently select different functional modules on the user interface according to specific needs, which reduces the difficulty for users to learn and use the model.

Embodiments of the present application provide a computer program product. When the computer program product is run on an electronic device, it causes the electronic device to execute the technical solutions in the above embodiments. The implementation principles and technical effects are similar to the above-mentioned method-related embodiments, and will not be described again here.

Embodiments of the present application provide a readable storage medium. The readable storage medium contains instructions. When the instructions are run on an electronic device, the electronic device executes the technical solutions of the above embodiments. The implementation principles and technical effects are similar and will not be described again here.

Embodiments of the present application provide a chip, which is used to execute instructions. When the chip is running, it executes the technical solutions in the above embodiments. The implementation principles and technical effects are similar and will not be described again here.

Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professionals and technicians may use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of the embodiments of the present application.

Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

The unit described as a separate component may or may not be physically separated, and the component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.

If this function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application are essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method in various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .

The above are only specific implementation modes of the embodiments of the present application, but the protection scope of the embodiments of the present application is not limited thereto. Any person familiar with the technical field can easily implement the implementation within the technical scope disclosed in the embodiments of the present application. Any changes or substitutions that come to mind should be included in the protection scope of the embodiments of this application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims

A method of model generation, characterized by including:

Get a sample image data set;

A model of the tree structure is determined based on the sample image data set. The tree structure includes N layers. Each layer in the N layers includes at least one node, wherein the input of the node of the first layer is the Sample image data set, the input of the node in the i-th layer is the output of one of the nodes in the i-1 layer, where N is a positive integer greater than 1, i=2,...,N;

Each of the nodes is one of the following controls:

Image segmentation function control, image classification function control, image target detection function control, optical character recognition function control.
The method of claim 1, wherein determining a tree structure model based on the sample image data set includes:

In response to the user's first operation, display the control options of the k-th layer of the tree structure;

In response to the user's second operation, determine the node of the k-th layer from the control options of the k-th layer;

Display a tool for performing sample processing on the sample image data set according to the determined node of the kth layer;

Use the sample image data set after the sample processing to train the tree-structured model,

Among them, k=1,2,...,N.
The method of claim 2, wherein the tree-structured model includes a first sub-model, and the first sub-model includes an m-th layer and an m+1-th layer, wherein the m-th layer The nodes of are the image segmentation function controls, and the nodes of the m+1th layer are the image target detection function controls, where m is a positive integer less than N.
The method according to claim 2, characterized in that the tree-structured model includes a second sub-model, and the second sub-model includes an m-th layer, an m+1-th layer and an m+2-th layer, wherein , the node on the mth layer is the image segmentation function control, the node on the m+1th layer is the image target detection function control, and the node on the m+2th layer is the optical character recognition function Control, where m is a positive integer less than N.
The method according to any one of claims 2 to 4, characterized in that the sample processing is one of the following processing:

Marking processing, sample selection processing, scaling processing, preprocessing.
The method according to claim 5, characterized in that the function corresponding to the image segmentation function control is a segmentation algorithm based on deep learning.
An electronic device, characterized by including:

one or more processors;

one or more memories;

The one or more memories store one or more computer programs, the one or more computer programs include instructions that, when executed by the one or more processors, cause the electronic device to perform the following step:

Get a sample image data set;

A model of the tree structure is determined based on the sample image data set. The tree structure includes N layers. Each layer in the N layers includes at least one node, wherein the input of the node of the first layer is the Sample image data set, the input of the node in the i-th layer is the output of one of the nodes in the i-1 layer, where N is a positive integer greater than 1, i=2,...,N;

Each of the nodes is one of the following controls:

Image segmentation function control, image classification function control, image target detection function control, optical character recognition function control.
The electronic device according to claim 7, wherein the model of the tree structure is determined based on the sample image data set, and when the instructions are executed by the one or more processors, the electronic device The device performs the following steps:

In response to the user's first operation, display the control options of the k-th layer of the tree structure;

In response to the user's second operation, determine the node of the k-th layer from the control options of the k-th layer;

Display a tool for performing sample processing on the sample image data set according to the determined node of the kth layer;

Use the sample image data set after the sample processing to train the tree-structured model,

Among them, k=1,2,...,N.
The electronic device according to claim 8, wherein the tree-structured model includes a first sub-model, and the first sub-model includes an m-th layer and an m+1-th layer, wherein the m-th layer The nodes of the layer are the image segmentation function controls, and the nodes of the m+1th layer are the image target detection function controls, where m is a positive integer less than N.
The electronic device according to claim 8, characterized in that the tree-structured model includes a second sub-model, and the second sub-model includes an m-th layer, an m+1-th layer and an m+2-th layer, Wherein, the node on the mth layer is the image segmentation function control, the node on the m+1th layer is the image target detection function control, and the node on the m+2th layer is the optical character recognition Function control, where m is a positive integer less than N.
The electronic device according to any one of claims 8 to 10, characterized in that the sample processing is one of the following processes:

Marking processing, sample selection processing, scaling processing, preprocessing.
The electronic device according to claim 11, wherein the function corresponding to the image segmentation function control is a segmentation algorithm based on deep learning.
A device for model generation, characterized by a processor coupled to a memory, the memory is used to store a computer program, and the processor is used to run the computer program, so that the device for model generation performs as claimed in claim 1 The method described in any one of to 6.
The device for model generation according to claim 13, characterized in that the device for model generation further includes one or more of the memory and a transceiver, the transceiver being used to receive signals and/or send signals. .
A computer-readable storage medium, characterized in that the computer-readable storage medium includes a computer program or instructions. When the computer program or instructions are run on a computer, the computer-readable storage medium causes the computer-readable storage medium to perform as claimed in any one of claims 1 to 6. The method described is executed.