CN115098998B

CN115098998B - Model training method and system based on simulation data

Info

Publication number: CN115098998B
Application number: CN202210576077.8A
Authority: CN
Inventors: 杨吉利; 刘凯; 刘利非
Original assignee: Shanghai Xiding Intelligent Technology Co ltd
Current assignee: Shanghai Xiding Intelligent Technology Co ltd
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2023-05-12
Anticipated expiration: 2042-05-25
Also published as: CN115098998A

Abstract

The invention discloses a model training method based on simulation data, in the method, firstly, a simulation experiment environment corresponding to a real experiment environment is constructed, wherein the simulation experiment environment comprises a simulation experiment table, a simulation experiment instrument, a simulation figure and actions; then, based on a reinforcement learning algorithm, the action of a simulated person in a simulated experiment environment and a simulated experiment instrument complete experimental contents according to preset standard experimental steps, so that simulated experiment data are obtained; and finally, inputting the real experimental data and the simulation experimental data into an AI model for training to obtain a trained AI model so as to identify the real experimental data by using the trained AI model. The method can reduce the manual labeling cost and improve the model training efficiency and the model recognition accuracy.

Description

Model training method and system based on simulation data

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a model training method, a system, computing equipment and a storage medium based on simulation data.

Background

The model is a core concept in the field of artificial intelligence, each artificial intelligence application aims at specific requirements of application scenes, a proper model is built or selected, training and fitting are carried out on the model by utilizing corresponding data and algorithms, and then tasks of the specific application scenes are automatically solved by utilizing the model which is well trained and fitted.

At present, for the completion of actual operations such as physical chemistry experiments in junior high school and the like, an AI identification model can be adopted to carry out intelligent identification on pictures in the experimental process, and a data acquisition mode is generally manually acquired, for example, the content of experimental pictures is manually consulted and labeled, the pictures are stored into a standard JPG format and classified, and the pictures with the same label are placed in the same folder. The manual label setting method consumes high manpower and resource cost, and if the cost is reduced and the data amount is reduced, the accuracy of model identification is reduced.

Therefore, there is a need for a model training method that can provide a large amount of training learning data for a model while reducing costs, and can improve accuracy of model recognition to solve the above problems in the prior art.

Disclosure of Invention

In view of the foregoing, the present invention has been developed to provide a model training method, system, computing device, and storage medium based on simulation data that overcomes or at least partially solves the foregoing problems.

According to one aspect of the present invention, there is provided a model training method based on simulation data, in which a simulation experiment environment corresponding to a real experiment environment is first constructed, wherein the simulation experiment environment includes a simulation experiment table, a simulation experiment instrument, a simulation character and an action. Then, based on the reinforcement learning algorithm, the simulation characters and the simulation experiment instruments in the simulation experiment environment complete experiment contents according to preset standard experiment steps, and simulation experiment data are obtained. And finally, inputting the real experimental data and the simulation experimental data into an AI model for training to obtain a trained AI model so as to identify the real experimental data by using the trained AI model.

According to the scheme, simulation experiment data can be used for replacing part of real experiment data to serve as training data to train the model, on one hand, the accuracy and efficiency of marking of the training data can be improved, on the other hand, the accuracy of the recognition effect of the model after training can be improved, the cost can be reduced, and the efficiency of AI recognition on the experiment completion condition is improved.

Optionally, in the above method, the AI model is an image recognition model, and a weight of the real experimental data in a model training process is greater than a weight of the simulation experimental data. This prevents model failure due to over-fitting.

Optionally, in the above method, the size and the position of the simulation experiment table, the simulation experiment instrument and the simulation character required in the experiment process can be set through the GPU; the action of the simulated person and the feedback of the simulated experimental instrument in response to the action of the simulated person can also be set; and adding random simulated light rays in a simulation experiment environment. Therefore, the generation effect of the simulation experiment environment can be displayed in real time, and the simulation experiment environment can be adjusted according to actual needs.

Optionally, in the method, the simulated character action includes motion limitation of the simulated character, path setting of the simulated character to take the simulated experiment instrument, and feedback of the simulated experiment instrument in response to the simulated character action includes a placement step, a placement position and a state of the simulated experiment instrument.

Optionally, in the above method, the simulation experiment table, the simulation experiment instrument, the simulation person and the action in the simulation experiment environment are set independently from each other, so as to facilitate subsequent independent extraction.

Optionally, in the above method, firstly, according to a preset standard experiment step, determining the action of a simulated person in a simulation environment, the moving position and the placing step of a simulated experiment instrument; punishment is carried out on the simulated character actions and the simulated experimental instruments when errors occur in the simulated character actions or the deviation between the moving positions of the simulated experimental instruments and the standard moving positions is larger than a first preset threshold value or errors occur in the placing steps of the simulated experimental instruments; when the action of the simulation person is correct or the deviation between the moving position of the simulation experiment instrument and the standard moving position is smaller than a second preset threshold value or the placing step of the simulation experiment instrument is correct, rewarding the simulation experiment instrument; and obtaining a simulation experiment video in the process of completing the simulation experiment by the simulation person.

Optionally, in the above method, first, the obtained simulation experiment video is decomposed into multiple frame images; then, extracting the action of the simulated person and the position data of a simulated experiment instrument in each frame of image; and finally, extracting and marking the simulated character actions and the simulated experiment instruments according to the position data to obtain simulated experiment data.

According to another aspect of the present invention, there is provided a model training system based on simulation data, the system comprising: the simulation system comprises a simulation environment construction module, a simulation module and a training module. The simulation environment construction module can construct a simulation experiment environment corresponding to the real experiment environment, wherein the simulation experiment environment comprises a simulation experiment table, a simulation experiment instrument, a simulation character and actions. The simulation module can enable the simulation characters and the simulation experiment instruments in the simulation experiment environment to complete experiment contents according to preset standard experiment steps based on the reinforcement learning algorithm, and simulation experiment data are obtained. The training module can input real experimental data and simulation experimental data into the AI model for training to obtain a trained AI model so as to identify the real experimental data by using the trained AI model.

According to yet another aspect of the present invention, there is provided a computing device comprising: at least one processor; and a memory storing program instructions, wherein the program instructions are configured to be executed by the at least one processor, the program instructions comprising instructions for performing the above-described method.

According to yet another aspect of the present invention, there is provided a readable storage medium storing program instructions that, when read and executed by a computing device, cause the computing device to perform the above-described method.

According to the scheme of the invention, the following technical effects can be achieved:

1. the artificial real annotation data required by model training is greatly reduced, and the artificial real annotation data can be partially replaced by the simulation annotation data. And the extraction and labeling of the simulation data can be automatically generated through the system provided by the scheme, so that manual labeling is not needed, and a great deal of cost is saved.

2. The simulation environment and the deep learning training environment are split and run in the two parts respectively. The GPU is used for running a simulation environment, the AI accelerator is used for running reinforcement deep learning to obtain simulation experiment data, and running model training based on the simulation data and real data, so that more efficient running efficiency can be achieved.

3. The simulation experiment data is used for extracting and marking, so that marked information is more accurate, the accuracy of training data can be improved, and the accuracy of the model recognition effect after training can be improved.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 illustrates a schematic diagram of a model training system 100 based on simulation data in accordance with one embodiment of the present invention;

FIG. 2 illustrates a schematic diagram of a computing device 200, according to one embodiment of the invention;

FIG. 3 illustrates a flow diagram of a model training method 300 based on simulation data, in accordance with one embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

For machine learning or deep learning, it is conventional practice to input tagged data into a model for model training, which requires manual preparation of data and tags, but many scenes cannot prepare enough data, such as complex image processing, and it is difficult to provide enough data by tagging behaviors.

While simulation has the greatest benefit of generating millions and tens of millions of data through simulation even without a real system, so that real scenes and simulation models can be combined. The simulation can supplement data which cannot be measured by the real scene, the real scene can also verify the credibility of the simulation data, and the simulation data and the real scene supplement each other to enable model training to be more perfect. The training method of the AI recognition model is improved aiming at the actual operation experiment completion conditions such as physical chemistry in junior high school and the like, and training data of the AI model is obtained by utilizing an analog simulation experiment environment so as to improve the accuracy rate of model recognition.

FIG. 1 shows a schematic diagram of a model training system 100 based on simulation data, according to one embodiment of the invention. As shown in FIG. 1, the system 100 includes a simulation environment construction module 110, a simulation module 120, and a training module 130. The simulation environment construction module 110 is configured to construct a simulation experiment environment corresponding to a real experiment environment based on the GPU (graphics processing unit or graphics accelerator) processing the image and calculating the local image attribute, where the simulation experiment environment includes simulation contents such as a simulation experiment table, a simulation experiment instrument, a simulation character, and an action corresponding to the real experiment environment. The simulated character may be a student, and the actions of the simulated character may include motion limits of the simulated character, path settings when the instrument is taken, and the like. The simulation instrument can perform corresponding feedback on the influence of the simulated character bones as in the real situation. Wherein each simulation content is an independent simulation model, and each independent simulation model has own motion limiting conditions. For example, the forearm movement of the simulated character rotates 170 degrees around the elbow as much as possible, the finger can flexibly take various simulation instruments and the like, and the movement limiting condition is the same as that of the actual character. Meanwhile, each independent simulation model has own size and position information so as to be convenient for subsequent extraction and use.

The simulation module 120 may make the simulation characters and the simulation experiment instruments in the simulation experiment environment complete the experiment contents according to the preset standard experiment steps based on the reinforcement learning algorithm in the AI accelerator, so as to obtain the simulation experiment data. The simulation content in the simulation process can be judged according to the preset correct moving position and placing steps of the experimental instrument or the object based on the AI accelerator. Punishment is performed when serious deviation occurs to the moving position of the simulation instrument or the article, namely the deviation exceeds a first preset threshold value; or punishment is carried out when errors occur in the placement steps of the instrument. And awarding when the moving position of the simulation instrument or the article approaches the set position, namely the deviation is smaller than a second preset threshold value; alternatively, rewards are made when the placement steps of the simulation instrument are correct. Such that the actions performed by the simulated character will gradually tend toward correct experimental operation from the original non-canonical operation. Thus, a large amount of simulation experiment data can be obtained.

The training module 130 may input the real experimental data and the simulation experimental data into the AI model for training, and obtain a trained AI model, so as to identify the real experimental data by using the trained AI model. The training environment of the AI recognition model can be built through an AI accelerator, and the training environment comprises an interface of real experimental data and an interface of simulation experimental data.

In order to avoid the situation that the model fails due to the over fitting of the model after training by using the simulation experiment data only, the real experiment data is required to be added during the model training. The real experimental data can be directly transmitted into the model through the interface. The credible weight can be respectively added to the real experimental data and the simulation experimental data during model training. The weight value of the real experiment data may be set to be greater than the weight value of the simulation experiment data, that is, although the number of the simulation experiment data is greater than the number of the real experiment data, in order to ensure the reliability of the model training, the weight value (i.e., the importance degree) of the real experiment data may be set to be greater than the weight value of the simulation experiment data, and the accuracy of the model training may be improved. The specific weight can be adjusted according to the actual training effect. The simulation experiment data mainly plays an optimization role in the model training process and is used for improving the accuracy of real experiment data training. If the weight of the simulation experiment data is larger than that of the real experiment data in the weight setting, the situation of fitting is easy to occur, so that the model is invalid. The real data and the simulation data are integrated together to carry out deep learning training, and finally, a model with high recognition rate can be obtained, and the model can be further used for recognizing the experimental data of the real students.

The system replaces the artificial real data required by the AI recognition model training with the simulation data partially, the simulation data can be automatically generated in the simulation environment based on the reinforcement learning algorithm, the calibration is not required manually, and a large amount of labor cost is saved. The acquisition and calibration efficiency of the simulation data is far higher than that of manual calibration, so that the system can accelerate the training efficiency of the AI recognition model.

Because the simulation environment and the deep learning training have higher operation requirements on hardware, the system splits the simulation environment and the training environment into two parts for operation respectively. The GPU has better image processing effect, and is used for running a simulation environment. The AI accelerator has more optimization on the deep learning, so the AI accelerator can be used for strengthening the deep learning to obtain simulation experiment data and running model training based on the simulation data and real data. Finally, the effect of high-efficiency operation can be achieved.

FIG. 2 illustrates a block diagram of a computing device 200 according to one embodiment of the invention. As shown in FIG. 2, in a basic configuration 202, computing device 200 typically includes a system memory 206 and one or more processors 204. A memory bus 208 may be used for communication between the processor 204 and the system memory 206.

Depending on the desired configuration, the processor 204 may be any type of processing including, but not limited to: microprocessor (μp), microcontroller (μc), digital information processor (DSP), or any combination thereof. Processor 204 may include one or more levels of cache, such as a first level cache 210 and a second level cache 212, a processor core 214, and registers 216. The example processor core 214 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 218 may be used with the processor 204, or in some implementations, the memory controller 218 may be an internal part of the processor 204.

Depending on the desired configuration, system memory 206 may be any type of memory including, but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. Physical memory in a computing device is often referred to as volatile memory, RAM, and data in disk needs to be loaded into physical memory in order to be read by processor 204. The system memory 206 may include an operating system 220, one or more applications 222, and program data 224. The application 222 is in effect a plurality of program instructions for instructing the processor 204 to perform a corresponding operation. In some implementations, the application 222 can be arranged to execute instructions on an operating system by the one or more processors 204 using the program data 224 in some implementations. The operating system 220 may be, for example, linux, windows or the like, which includes program instructions for handling basic system services and performing hardware-dependent tasks. The application 222 includes program instructions for implementing various user desired functions, and the application 222 may be, for example, a browser, instant messaging software, a software development tool (e.g., integrated development environment IDE, compiler, etc.), or the like, but is not limited thereto. When an application 222 is installed into computing device 200, a driver module may be added to operating system 220.

When the computing device 200 starts up running, the processor 204 reads the program instructions of the operating system 220 from the memory 206 and executes them. Applications 222 run on top of operating system 220, utilizing interfaces provided by operating system 220 and underlying hardware, to implement various user-desired functions. When the user launches the application 222, the application 222 is loaded into the memory 206, and the processor 204 reads and executes the program instructions of the application 222 from the memory 206.

Computing device 200 also includes a storage device 232, where storage device 232 includes removable storage 236 and non-removable storage 238, where removable storage 236 and non-removable storage 238 are each connected to storage interface bus 234.

Computing device 200 may also include an interface bus 240 that facilitates communication from various interface devices (e.g., output devices 242, peripheral interfaces 244, and communication devices 246) to basic configuration 202 via bus/interface controller 230. The example output device 242 includes a graphics processing unit 248 and an audio processing unit 250. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 252. The example peripheral interface 244 may include a serial interface controller 254 and a parallel interface controller 256, which may be configured to facilitate communication via one or more I/O ports 258 and external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.). The example communication device 246 may include a network controller 260 that may be arranged to facilitate communication with one or more other computing devices 262 over a network communication link via one or more communication ports 264.

The network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media in a modulated data signal, such as a carrier wave or other transport mechanism. A "modulated data signal" may be a signal that has one or more of its data set or changed in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or special purpose network, and wireless media such as acoustic, radio Frequency (RF), microwave, infrared (IR) or other wireless media. The term computer readable media as used herein may include both storage media and communication media.

Computing device 200 also includes a storage interface bus 234 that is coupled to bus/interface controller 230. The storage interface bus 234 is coupled to the storage device 232, and the storage device 232 is adapted for data storage. Exemplary storage 232 may include removable storage 236 (e.g., CD, DVD, U disk, removable hard disk, etc.) and non-removable storage 238 (e.g., hard disk drive HDD, etc.).

In computing device 200 according to the invention, application 222 includes a plurality of program instructions to perform method 300. FIG. 3 illustrates a flow chart of a model training method 300 based on simulation data in accordance with one embodiment of the present invention. As shown in fig. 3, the method 300 starts with step S310, and constructs a simulation experiment environment corresponding to the real experiment environment, where the simulation experiment environment includes a simulation experiment table, a simulation experiment instrument, a simulation character, and an action. The GPU (graphic processor) can be used for building a simulation experiment environment, and each link of an experiment can be truly simulated to form a simulation experiment video. The simulation environment can comprise simulation contents such as a simulation operation experiment table, some simulation experiment instruments, articles, simulation student characters, actions and the like. The simulation contents are independently arranged and are independent simulation models, and each independent simulation model has own motion limiting conditions. For example, the forearm movement of the simulated character rotates around the elbow as much as 170 degrees, the fingers of the simulated character can flexibly take various simulation instruments, and operations such as inversion, titration and the like can be realized, that is, the limiting action of the simulated character is the same as the action of the actual character. After the building of each simulation model is completed, the actions of the simulation figures and the feedback of the simulation experiment instrument in response to the actions of the simulation figures can be set. For example, the simulated character actions may include skeletal motion limits of the simulated character, path settings when the simulated instrument is taken, but paths for placing instruments or items may be randomly produced. The simulation instrument or the object can carry out corresponding feedback as in the actual situation according to the influence of the bone motion of the simulation character, such as the placement position, the placement step, the placement state and the like of the simulation instrument.

Since the training data of the recognition model needs to contain the states of the articles under different light rays, various random simulated light irradiation conditions can be added in the simulation process. Meanwhile, each independent simulation model has own size and position information so as to facilitate the subsequent extraction and use. The picture extraction can be automatically carried out on the simulation experiment instrument, the simulation person and the action under the simulation environment.

Step S320 is then executed, and based on the reinforcement learning algorithm, the simulation characters and the simulation experiment instruments in the simulation experiment environment complete the experiment contents according to the preset standard experiment steps, so as to obtain simulation experiment data. As the AI accelerator has the function of strengthening deep learning and is matched with a simulation environment, the experiment can be automatically simulated according to the experiment standard. Wherein reinforcement learning is essentially a feedback system, which does not require training a model with a large amount of data, but creates an environment, establishes a penalty mechanism, if a given goal is achieved to give a reward, and if no given goal is achieved to give a penalty. Thus, the final objective may be optimized based on penalty or prize to get more prizes, minimal penalty.

In one embodiment of the invention, the simulation character action and the simulation instrument placement position and the placement step in the simulation process can be judged according to the preset correct simulation character action, the preset simulation instrument movement position and the preset placement step. Punishment is performed when serious deviation occurs to the moving position of the simulation instrument or the article, namely the deviation exceeds a first preset threshold value; or punishment is carried out when errors occur in the placement steps of the instrument. And awarding when the moving position of the simulation instrument or the article approaches the set position, namely the deviation is smaller than a second preset threshold value; alternatively, rewards are made when the placement steps of the simulation instrument are correct. Such that the actions performed by the simulated character will gradually tend toward correct experimental operation from the original non-canonical operation. In the process of repeatedly executing the reinforcement learning training by the simulated character, a large amount of simulation experiment videos can be obtained, and the simulation experiment videos comprise simulation experiment data of correct operation and simulation experiment data of incorrect operation and can be used for training data of a subsequent model.

In order to achieve the noted simulation data, the obtained simulation experiment video can be decomposed into multi-frame images; then extracting the action of the simulated person and the position data of the simulated experimental instrument in each frame of image; and finally, marking the position data to obtain simulation experiment data. In the simulation environment, various instruments and human motions are independent, for example, human hand motions are taken as a simulation content, and test tubes are taken as a simulation content. The respective simulation contents may form position data in the simulation environment. And (3) screenshot is carried out according to the position data of each simulation content, so that the pictures of the actions of the simulation person, the simulation instrument or the object in the simulation environment can be extracted and marked. The longer the simulation environment is run, the more simulation data is available.

And finally, executing step S330, inputting the real experimental data and the simulation experimental data into the AI model for training, and obtaining a trained AI model so as to identify the real experimental data by using the trained AI model. The real experimental data and the simulation experimental data are marked images, and the AI model in the embodiment is an image recognition model.

The training environment of the AI recognition model can be built through an AI accelerator and other deep learning frameworks, and the training environment comprises an interface of real experimental data and an interface of simulation experimental data. The deep learning framework may provide a series of operators, such as convolutions, full connections, various activation functions (e.g., relu, sigmoid), various gradient update algorithms (e.g., adam, RMS), etc., supporting forward computation and reverse gradient update, and be able to train pre-built AI recognition models.

In order to avoid model overfitting after training with simulation experimental data only, i.e. small loss on the training set, large loss on the validation set or the test set, resulting in model failure. Therefore, the addition of real experimental data is also required in model training. The real experimental data can be directly transmitted into the model through the real data interface. The credible weight can be respectively added to the real experimental data and the simulation experimental data during model training. The weight value of the real experiment data may be greater than the weight value of the simulation experiment data, that is, although the number of the simulation experiment data is greater than the number of the real experiment data, in order to ensure the reliability of the model training, the weight value (i.e., the importance degree) of the real experiment data may be set to be greater than the weight value of the simulation experiment data, and the accuracy of the model training may be improved. The specific weight can be adjusted according to the actual training effect.

The simulation experiment data mainly plays an optimization role in the model training process and is used for improving the accuracy of real experiment data training. If the weight of the simulation experiment data is larger than that of the real experiment data in the weight setting, the situation of fitting is easy to occur, so that the model is invalid. And transmitting the real data into a deep learning training environment in the AI accelerator, and training in combination with the simulation data. Assuming that only 100 real data exist originally, 100 real data and 1000 simulation data are trained together, and the training effect of 500 real data can be achieved. And finally, the accuracy of the trained model in identifying the actual student experimental data can be improved.

Through the scheme, the simulation experiment environment is built based on the high-performance GPU, and then the simulation experiment video is obtained based on the reinforcement deep learning function in the AI accelerator and matched with the built simulation experiment environment so that the simulation characters can automatically complete the experiment content. The obtained simulation experiment video can be disassembled into one frame-by-one frame of picture, the local images of the action of the experimental instrument and the person can be extracted from each frame of picture and automatically marked, so that simulation experimental data can be obtained. And finally, based on a deep learning training environment in the AI accelerator, combining real experimental data with simulation experimental data to perform model training, and finally enabling the trained model to have higher recognition accuracy.

Therefore, the beneficial effects achieved by the scheme at least comprise:

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment, or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into a plurality of sub-modules.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

Furthermore, some of the embodiments are described herein as methods or combinations of method elements that may be implemented by a processor of a computer system or by other means for performing the functions. Thus, a processor with the necessary instructions for implementing a method or a method element forms a means for implementing the method or the method element. Furthermore, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is for carrying out the functions performed by the elements for carrying out the objects of the invention.

As used herein, unless otherwise specified the use of the ordinal terms "first," "second," "third," etc., to describe a general object merely denote different instances of like objects, and are not intended to imply that the objects so described must have a given order, either temporally, spatially, in ranking, or in any other manner.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of the above description, will appreciate that other embodiments are contemplated within the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is defined by the appended claims.

Claims

1. A model training method based on simulation data, adapted to be executed in a computing device, the method comprising:

constructing a simulation experiment environment corresponding to a real experiment environment, wherein the simulation experiment environment comprises a simulation experiment table, a simulation experiment instrument, a simulation character and actions;

based on reinforcement learning algorithm, the simulation characters and the simulation experiment instrument in the simulation experiment environment complete experiment contents according to preset standard experiment steps to obtain simulation experiment data, wherein the simulation experiment data comprises: judging actions of the simulation characters, moving positions of the simulation experiment instruments and placing steps in the simulation environment according to preset standard experiment steps; punishment is carried out on the simulated character actions and the simulated experimental instruments when errors occur in the simulated character actions or the deviation between the moving positions of the simulated experimental instruments and the standard moving positions is larger than a first preset threshold value or errors occur in the placing steps of the simulated experimental instruments; when the action of the simulation person is correct or the deviation between the moving position of the simulation experiment instrument and the standard moving position is smaller than a second preset threshold value or the placing step of the simulation experiment instrument is correct, rewarding the simulation experiment instrument; obtaining a simulation experiment video in the process of completing a simulation experiment by a simulation person;

and inputting the real experimental data and the simulation experimental data into an AI model for training to obtain a trained AI model so as to identify the real experimental data by using the trained AI model.

2. The method of claim 1, wherein the AI model is an image recognition model and the real experimental data has a greater weight than the simulated experimental data during model training.

3. The method of claim 1, wherein the step of constructing a simulation experiment environment corresponding to the real experiment environment comprises:

setting the sizes and positions of a simulation experiment table, a simulation experiment instrument and a simulation figure required in the experiment process through a GPU;

setting the action of a simulation person and the feedback of a simulation experiment instrument in response to the action of the simulation person; and

random simulated light is added in the simulation experiment environment.

4. The method of claim 3, wherein the simulated character actions include a motion limit of the simulated character, a path setting for the simulated character to take a simulated laboratory instrument, and feedback from the simulated laboratory instrument in response to the simulated character actions includes a placement step, a placement location, and a status of the simulated laboratory instrument.

5. The method of claim 1, wherein the simulation experiment table, the simulation experiment instrument, the simulation character and the action in the simulation experiment environment are independently arranged.

6. The method of claim 1, wherein the step of obtaining simulation experiment data by making the simulation characters in the simulation experiment environment complete the experiment contents according to the preset standard experiment steps based on the reinforcement learning algorithm comprises:

decomposing the simulation experiment video into multi-frame images;

extracting the action of the simulated person and the position data of a simulated experimental instrument in each frame of image;

and extracting and labeling images of the simulated character actions and the simulated experiment instruments according to the position data to obtain simulated experiment data.

7. A model training system based on simulation data, the system comprising:

the simulation environment construction module is suitable for constructing a simulation experiment environment corresponding to the real experiment environment, wherein the simulation experiment environment comprises a simulation experiment table, a simulation experiment instrument, a simulation character and actions;

the simulation module is suitable for enabling a simulation character and a simulation experiment instrument in a simulation experiment environment to complete experiment contents according to preset standard experiment steps based on a reinforcement learning algorithm to obtain simulation experiment data, and comprises the following steps: judging actions of the simulation characters, moving positions of the simulation experiment instruments and placing steps in the simulation environment according to preset standard experiment steps; punishment is carried out on the simulated character actions and the simulated experimental instruments when errors occur in the simulated character actions or the deviation between the moving positions of the simulated experimental instruments and the standard moving positions is larger than a first preset threshold value or errors occur in the placing steps of the simulated experimental instruments; when the action of the simulation person is correct or the deviation between the moving position of the simulation experiment instrument and the standard moving position is smaller than a second preset threshold value or the placing step of the simulation experiment instrument is correct, rewarding the simulation experiment instrument; obtaining a simulation experiment video in the process of completing a simulation experiment by a simulation person; and

the training module is suitable for inputting the real experimental data and the simulation experimental data into the AI model for training, so as to obtain a trained AI model, and the trained AI model is used for identifying the real experimental data.

8. A computing device, comprising: at least one processor and a memory storing program instructions, wherein the program instructions are configured to be adapted to be executed by the at least one processor, the program instructions comprising instructions for performing the method of any of claims 1-6.

9. A readable storage medium storing program instructions which, when read and executed by a computing device, cause the computing device to perform the method of any of claims 1-6.