CN116340315A

CN116340315A - Data archiving method and AI system based on digital factory

Info

Publication number: CN116340315A
Application number: CN202310220864.3A
Authority: CN
Inventors: 杨光城; 尹贵云
Original assignee: Individual
Current assignee: Individual
Priority date: 2023-03-09
Filing date: 2023-03-09
Publication date: 2023-06-27

Abstract

The application relates to the technical field of digital factories and artificial intelligence, and relates to a data archiving method and an AI system based on the digital factories. According to the method and the device, based on the similarity training mining network for mining the surface layer overall description vector and the bottom state information mining network for mining the production situation description vector, a model capable of performing two-dimensional training is obtained, and in the process of archiving the large factory data set to be archived by adopting the factory data archiving model provided by the embodiment of the application, the large factory data can be expressed and described based on the combination of the surface layer description vector and the production situation description vector, so that the model performance after collaborative debugging is improved, the mining efficiency of the description vector is improved, the large factory data set is classified by the obtained large factory data set integration description vector, and accurate and reliable classification information can be obtained, so that the accuracy, reliability and speed of archiving the large factory data set are increased.

Description

Data archiving method and AI system based on digital factory

Technical Field

The application relates to the technical field of digital factories and artificial intelligence, in particular to a data archiving method and an AI system based on the digital factories.

Background

With the popularity of the industry 4.0 concept, digital factories are considered the future of the manufacturing industry. The manufacturing enterprises need to utilize resources to the maximum extent, so that the production becomes more efficient, the manufacturing flow is required to be analyzed and controlled accurately by means of digitization, the production situation is mastered, each link is flexible to change, and the production is more flexible. The digital factory is based on a physical factory, provides basic conditions for intelligent manufacturing by means of the architecture of an industrial Internet, and stores, processes, analyzes and presents data through information technology means from the acquisition of the data so as to feed back to production. Before the front end is fed back, the large factory data sets are reasonably filed, so that subsequent analysis is facilitated, for example, large factory data sets of the same type are filed in the same storage space, and subsequent centralized analysis on large factory data sets corresponding to the storage space is facilitated, and production plans are adjusted or abnormal links are found. Then, how to reasonably and accurately classify and archive the large data set of the factory is a technical problem to be considered.

It should be noted that the foregoing is merely provided to facilitate understanding of the present application and is not to be construed as a basis for evaluating the novelty of the present application.

Disclosure of Invention

The invention aims to provide a data archiving method and an AI system based on a digital factory so as to solve the problems.

The embodiment of the application is realized in the following way:

in a first aspect, an embodiment of the present application provides a data archiving method based on a digital factory, applied to a data archiving AI system, the method including:

responding to a data archiving instruction, and acquiring a large data set of a factory to be archived;

excavating a first surface layer overall description vector of the large data set of the factory to be archived through a similarity training excavation network in a factory data archiving model which is debugged in advance; wherein the first surface layer bulk description vector indicates a bulk initial information representation of the large data set of the plant to be archived;

digging a first production situation description vector of the large data set of the factory to be archived through a bottom layer state information mining network in the factory data archiving model; the first production situation description vector indicates production situation information representation of the large data set of the factory to be archived;

carrying out description vector integration on the first surface layer integral description vector and the first production situation description vector through a production situation reasoning network in the factory data archiving model, and then classifying the factory big data set to be archived through the obtained first integration description vector by adopting combined indication information to obtain classification information of the factory big data set to be archived; the factory data archiving model is obtained by adopting collaborative debugging on the basis of the similarity training mining network and the bottom layer state information mining network;

And archiving the large data set of the factory to be archived through the classification information of the large data set of the factory to be archived.

As a possible implementation manner, the classifying the large data set of the plant to be archived by the obtained first integrated description vector by adopting combined indication information further includes:

respectively mining second surface layer overall description vectors of each reference factory large data set in the factory large data set storage space through the similarity training mining network;

respectively mining second production situation description vectors of the large data sets of the factories to be archived through the bottom layer state information mining network;

integrating the description vector of each second surface layer integral description vector and the corresponding second production situation description vector one by one through the production situation reasoning network to obtain second integration description vectors corresponding to each reference factory big data set;

determining a commonality measurement result of each reference factory big data set and the factory big data set to be archived respectively through the first integration description vector and a second integration description vector corresponding to each reference factory big data set;

And determining a matched factory large data set corresponding to the factory large data set to be archived according to each commonality measurement result.

As one possible implementation, the plant data archiving model further comprises an initial linear processing network;

before the mining network is trained through the similarity in the factory data archiving model which is completed through the prior debugging and the first surface layer overall description vector of the factory big data set to be archived is mined, the method further comprises the following steps:

loading the large data set of the factory to be archived into an initial linear processing network in the factory data archiving model, and carrying out overall description vector mining on the large data set of the factory to be archived through the initial linear processing network to obtain an overall description vector set corresponding to the large data set of the factory to be archived;

the mining network is trained through similarity in a factory data archiving model which is completed through pre-debugging, and the mining of the first surface layer overall description vector of the factory big data set to be archived comprises the following steps: performing lookup table mapping operation on the integral description vector set through the similarity training mining network to obtain a first surface integral description vector corresponding to the large data set of the factory to be archived;

The bottom layer state information mining network comprises a linear transformation module and a table look-up module; the mining of the first production situation description vector of the large data set of the plant to be archived through the bottom layer state information mining network in the plant data archiving model comprises the following steps: performing description vector mining on the production situation information in the whole description vector set through a linear transformation module in the bottom state information mining network, and performing lookup table mapping operation on the production situation information obtained through mining through a lookup table module in the bottom state information mining network to obtain a first production situation description vector corresponding to the large data set of the factory to be archived;

the tuning parameter of the production situation reasoning network is larger than the tuning parameter of other networks, and the other networks comprise the initial linear processing network, the similarity training mining network and the bottom layer state information mining network.

As one possible implementation, the initial linear processing network includes a plurality of linear transformation modules;

the network parameter corresponding to the linear transformation module in the initial linear processing network is obtained by preprocessing the parameter which is debugged in advance through a preset debugging template set, and the network parameter corresponding to the linear transformation module in the bottom layer state information mining network is obtained by preprocessing in any mode.

As a possible implementation, the debugging process of the factory data archiving model comprises the following steps:

acquiring a debugging template library, and determining a debugging template set in the debugging template library;

loading the determined debugging template set into the debugged factory data archiving model, and acquiring a third surface layer overall description vector produced by a similarity training mining network in the factory data archiving model, wherein the third production situation description vector produced by the bottom layer state information mining network and an inference result produced by the production situation inference network represent the inference confidence degree of the debugging template corresponding to each classification indication information;

establishing a target error result through the third surface layer integral description vector, the third production situation description vector and the reasoning result, and performing iterative optimization on network parameters of the factory data archiving model through the target error result until the factory data archiving model meets a preset optimization cut-off condition, so as to obtain a debugged factory data archiving model;

the establishing an error result through the third surface layer overall description vector, the third production situation description vector and the reasoning result comprises the following steps: establishing a first multi-element error result through a third surface layer integral description vector corresponding to each debugging template in the debugging template set; establishing a second multi-component error result through a third production situation description vector corresponding to each debugging template; establishing a combined indication information error result through the respective corresponding reasoning result and the corresponding combined indication information vector of each debugging template, wherein the combined indication information vector characterizes the actual confidence degree of each classification indication information corresponding to the debugging template; acquiring a cross entropy error result through the reasoning result corresponding to each debugging template; and calculating the first multi-element error result, the second multi-element error result, the combined indication information error result and the cross entropy error result according to a preset calculation mode to obtain a target error result.

As one possible implementation, each set of debug templates comprises three debug templates, a reference template, a positive template, and a negative template, respectively;

the step of establishing a first multi-component error result by the third surface layer integral description vector corresponding to each debugging template in the debugging template set comprises the following steps:

determining a first Euclidean distance between a third surface layer integral description vector corresponding to a reference template and a third surface layer integral description vector corresponding to an active template group in the debugging template set, and determining a second Euclidean distance between the third surface layer integral description vector corresponding to the reference template and a third surface layer integral description vector corresponding to a negative template;

summing the difference value of the first Euclidean distance and the second Euclidean distance with a preset reference value to obtain a summation result, determining the summation result as a first target parameter, wherein the preset reference value is used for representing the difference value extremum of the commonality measurement result between the active template and the passive template;

determining the larger parameter of the first target parameter and the candidate parameter as the first multivariate error result;

The step of establishing a second multi-component error result through the third production situation description vector corresponding to each debugging template comprises the following steps:

a third Euclidean distance between a third production situation description vector corresponding to a reference template and a third production situation description vector corresponding to a positive template in the debugging template set is used for obtaining a fourth Euclidean distance between the third production situation description vector corresponding to the reference template and the third production situation description vector corresponding to the negative template;

summing the difference value of the third Euclidean distance and the fourth Euclidean distance with a preset reference value to obtain a summation result, determining the summation result as a second target parameter, wherein the third preset reference value is used for representing the difference value extremum of the commonality measurement result between the active template and the passive template;

and determining the larger parameter in the second target parameter and the candidate parameter as the second multivariate error result.

As one possible implementation, before the iteratively optimizing the network parameters of the plant data archiving model by the target error result, the method further comprises:

calculating the first multi-element error result, the combined indication information error result and the cross entropy error result according to a preset calculation mode to obtain a temporary error result;

And carrying out iterative debugging on network parameter values of the bottom layer state information mining network in the factory data archiving model according to a preset optimization round through the temporary error result.

As a possible implementation manner, the debug template set in the debug template library is obtained based on the following steps:

based on the obtained commonality measurement results among the plurality of factory large data set templates, constructing an active template group by every two factory large data set templates with the commonality measurement results larger than the preset commonality measurement results;

template combination is carried out through each positive template group, a plurality of debugging template sets for establishing the debugging template library are obtained, each debugging template set comprises three debugging templates which are respectively a reference template, a positive template and a negative template, wherein the reference template and the positive template are factory large data set templates with a commonality measurement result being larger than a preset commonality measurement result, and the negative template and the positive template are factory large data set templates with the commonality measurement result being not larger than the preset commonality measurement result;

the template combination is performed based on the active template group to obtain a plurality of debugging template sets for establishing the debugging template library, and the method comprises the following steps:

Determining a factory large data set template in an active template group, determining the factory large data set template as a selected template, and respectively determining a factory large data set template in the rest template groups as a temporary template;

determining a distance between the selected template and each of the temporary templates, respectively;

the temporary templates are listed according to the corresponding distances in an ascending or descending order, and at least one temporary template with a set sequence is determined to be a negative template corresponding to the selected template;

and respectively constructing a debugging template set by each determined negative template and the positive template set, wherein a selected template in the positive template set is a positive template in the debugging template set, and the other template in the positive template set is a reference template in the debugging template set.

In a second aspect, an embodiment of the present application provides a data archiving AI system, including a processor and a memory, where the memory stores a computer program, and the processor is configured to execute the computer program to implement the method described above.

According to the data archiving method and the AI system based on the digital factory, the model capable of performing two-dimensional training is obtained based on the similarity training mining network for mining the surface layer overall description vector and the bottom state information mining network for mining the production situation description vector, and the factory big data can be expressed and described based on the surface layer description vector and the production situation description vector in the process of archiving the factory big data set to be archived by adopting the factory data archiving model provided by the embodiment of the application, so that the model performance after collaborative debugging is improved, the mining efficiency of the description vector is increased, the factory big data set is classified by the obtained factory big data set integration description vector, and accurate and reliable classifying information can be obtained, so that the accuracy, reliability and speed of archiving the factory big data set are increased.

In the following description, other features will be partially set forth. Upon review of the ensuing disclosure and the accompanying figures, those skilled in the art will in part discover these features or will be able to ascertain them through production or use thereof. The features of the present application may be implemented and obtained by practicing or using the various aspects of the methods, tools, and combinations that are set forth in the detailed examples described below.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a flowchart of a data archiving method based on a digital factory according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a functional module architecture of a data archiving apparatus according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a data archiving AI system according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings in the embodiments of the present application. The terminology used in the description of the embodiments of the application is for the purpose of describing particular embodiments of the application only and is not intended to be limiting of the application.

The main execution body of the data archiving method based on the digital factory in the embodiment of the application is a computer device (i.e. the data archiving AI system provided in the embodiment of the application), which includes, but is not limited to, a server, a personal computer, a notebook computer, a tablet computer, a smart phone, etc. In particular, the server may be a server, for example, but not limited to, a single network server, a server group formed by a plurality of network servers, or a cloud formed by a large number of computers or network servers in cloud computing, where the cloud computing is a kind of distributed computing, and is a super virtual computer formed by a group of loosely coupled computer sets. The computer device can be used for realizing the application by running alone, and can also be accessed into a network and realized by interaction with other computer devices in the network. Wherein the network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.

The embodiment of the application provides a data archiving method based on a digital factory, which is applied to a data archiving AI system, as shown in fig. 1, and comprises the following steps:

step 100: in response to the data archiving instruction, a large data set of the plant to be archived is obtained.

In the embodiment of the application, the factory large data set is data which is reported by equipment collection and workers in the factory production process, such as equipment operation data, setting index actual data, production procedure data, procedure circulation data, worker operation data and the like. The factory is used for digitizing a production process by arranging an industrial internet, configured equipment such as a numerical control machine tool, an instrument and meter, an industrial sensor, an industrial robot and the like are arranged in an end layer of the industrial internet, generated data is uploaded to a platform layer (PaaS) through network service after being subjected to data acquisition, protocol analysis and edge processing of the edge layer, and the data archiving method based on the digital factory provided by the embodiment of the application can be applied to an industrial big data system in the platform layer. The large data set of the plant to be archived may be data that is called from an infrastructure layer (LaaS) by an AI system deployed in a platform layer.

Step 200: and excavating a first surface layer overall description vector of the large data set of the factory to be archived by a similarity training excavation network in the factory data archiving model which is debugged in advance, wherein the first surface layer overall description vector indicates the overall initial information representation of the large data set of the factory to be archived.

Step 300: and mining a first production situation description vector of the large data set of the factory to be archived through a bottom layer state information mining network in the factory data archiving model, wherein the first production situation description vector indicates the production situation information representation of the large data set of the factory to be archived.

The overall description vector of the surface layer in the embodiment of the application represents global data information of the surface layer of the large data set of the factory, the production situation description vector is obtained by performing depth feature mining on embedded information containing global description of the large data set of the factory, the factory data archiving model provided in the embodiment of the application combines the overall description vector of the surface layer and the state information description vector of the bottom layer (namely, the production situation description vector), the model comprises a similarity training mining network and a bottom layer state information mining network, mining of corresponding description vectors is performed according to the two mining networks, and classification of the large data set of the factory is performed, wherein the description vectors are vector information expression of data. The similarity training mining network is trained through similarity learning, and based on supervised learning, the correlation between input templates is quantitatively measured.

Step 400: and integrating the description vector of the first surface layer overall description vector with the first production situation description vector through a production situation reasoning network in the factory data archiving model, and then classifying the factory big data set to be archived through the obtained first integration description vector by adopting combined indication information to obtain classification information of the factory big data set to be archived.

In the embodiment of the application, the factory data archiving model is obtained by adopting collaborative debugging based on a similarity training mining network and a bottom layer state information mining network.

Step 500: and archiving the large data set of the factory to be archived through the classification information of the large data set of the factory to be archived.

At archiving, it is understood that the factory data sets to be archived with the same categorization information are stored in the same storage space for subsequent production analysis.

Based on the implementation process, the embodiment of the application provides a model capable of performing two-dimensional training based on a similarity training mining network for mining surface layer overall description vectors and a bottom state information mining network for mining production situation description vectors, the large factory data set to be archived can be classified through the factory data archiving model of the embodiment of the application, the large factory data can be expressed and described based on combining the surface layer description vectors and the production situation description vectors, the model performance after collaborative debugging is improved, the description vector mining efficiency is improved, the large factory data set is classified through the obtained large factory data set integration description vectors, accurate and reliable classification information can be obtained, and the accuracy, reliability and speed of large factory data set archiving are improved.

The method comprises the steps of adopting combination indication information to classify the large data set of the factory to be archived through the obtained first integration description vector, and further comprising the following steps:

step 410: and respectively mining second production situation description vectors of the large data sets of the factory to be archived through the bottom-layer state information mining network.

It will be appreciated that the data connotation indicated by the second surface layer bulk description vector is consistent with the data connotation indicated by the first surface layer bulk description vector, both indicating the overall initial information representation of the large factory data set, the first surface layer bulk description vector being for the large factory data set to be archived, and the second surface layer bulk description vector being for the reference large factory data set. The second production situation describing vector is consistent with the data meaning indicated by the first production situation describing vector, and the production situation information representation of the factory large data set is indicated.

Step 420: and integrating the description vectors of the whole description vectors of each second surface layer and the corresponding second production situation description vectors one by one through a production situation reasoning network to obtain second integration description vectors corresponding to each large data set of each reference factory.

Step 430: and determining the commonality measurement result of each reference factory big data set and the factory big data set to be archived respectively through the first integration description vector and the second integration description vector corresponding to each reference factory big data set.

Step 440: and determining a matching factory large data set corresponding to the factory large data set to be archived through each common measurement result.

By the steps, the matched factory large data set corresponding to the factory large data set to be archived in the factory large data set storage space is determined, wherein the matched factory large data set comprises a factory large data set which is highly similar to the factory large data set to be archived, namely, a reference factory large data set with a commonality measurement result larger than a set value. The whole data meaning and the production situation of the large factory data set may be in contact, some production situations only exist in the special whole data meaning, for example, equipment overload generally cannot occur in the whole limited production, and the data archiving method based on the digital factory provided by the embodiment of the application learns based on the contact, so that the performance of a model can be improved, and the classification accuracy of the large factory data set is improved.

As one implementation, the plant data archiving model provided by the embodiments of the present application further includes an initial linear processing network. In other words, the plant data archiving model includes an initial linear processing network, a similarity training mining network, an underlying state information mining network, and a production situation reasoning network.

As one embodiment, the initial linear processing network includes a plurality of linear transformation modules (such as convolution units), for example, a linear transformation network with a residual structure, and the underlying state information mining network may include a linear transformation module with a residual structure and a table look-up module, where the linear transformation module is a module for performing a deep convolution operation, and the table look-up module is a module for performing a deep embedding operation. Deeper data information can be extracted through a linear transformation module of the residual structure, and the number of superposition is selected according to actual conditions.

In the embodiment of the application, the network parameter corresponding to the linear transformation module in the initial linear processing network is determined by preprocessing the parameter which is debugged in advance through a preset debugging template set. The initial linear processing network provided in this embodiment of the present application may include a plurality of cascaded linear transformation modules, where the output result of each linear transformation module has different dimensions, and the specific structure of each linear transformation module, such as a convolution matrix size, a moving step length, and a residual block (each residual block may include a plurality of convolution layers, for example, 3 convolution layers in cascade, and the sizes are 1×1, 3×3, and 1×1, where the convolution layers are stacked by an activation function connection), whether including downsampling, and the downsampling size, etc. may all be adapted according to the actual situation, which is not limited in this embodiment of the present application, and the parameter of the residual unit debugged by the plurality of linear transformation modules is determined as the initial network parameter.

The similarity training mining network provided by the embodiment of the application can comprise two subunits, namely a maximum value downsampling unit and a classification mapping unit, wherein the maximum value downsampling unit and the classification mapping unit are different in output size, and the size of a result output by the classification mapping unit is the embedding dimension of the surface layer integral description vector.

As one implementation, the network parameter values corresponding to the linear transformation module of the underlying state information mining network may be obtained by preprocessing in any manner, such as preprocessing the mean distribution of the preset variance and mean. The network composition and parameters corresponding to the underlying state information mining network and the production situation reasoning network can be configured according to practical situations, and for example, the network composition and parameters comprise a linear transformation unit (comprising 2 residual blocks, each residual block is composed of 3 cascade convolution units with different sizes), a downsampling unit (such as maximum downsampling), an embedding unit (such as fully connected FC) and a classification mapping unit.

According to the method and the device, the model capable of performing two-dimensional training is obtained by configuring the bottom residual description vector mining module on the basis of the general description vector mining model, the model is debugged in advance and fine-tuned in a multi-section mode, so that a new module converges to a classification task, meanwhile, the efficiency problems of debugging tasks and repeated debugging are considered, and the large factory data is expressed and described on the basis of combining the surface description vector and the production situation description vector, so that the subsequent application is facilitated.

As an implementation manner, the embodiment of the present application further provides a manner of asynchronously tuning parameters (representing learning rate) to enhance the capability of the model, where the tuning parameters of the initial linear processing network, the similarity training mining network, and the underlying state information mining network are all, for example, 0.0001, and the tuning parameters of the production situation inference network are, for example, 0.001, in other words, the tuning parameters of the production situation inference network are greater than the tuning parameters of the remaining networks, and the remaining networks include the initial linear processing network, the similarity training mining network, and the underlying state information mining network.

The combined instruction information classification is to obtain consistent reasoning results for templates pointing to the same category, in other words, the same debug target is fit. The fitting of the full connection unit to the target is more common, so that the fitting of the embedded information is caused, the embedding of the factory big data sets which are classified in the same way is performed on the classified target, so that the embedding of the factory big data sets loses distinguishing characteristics, and based on the fact that the asynchronous tuning parameter values of the embodiment of the application enable the parameter value optimization speed of the embedded unit to be lower than that of the full connection unit (for example, one tenth of the parameter value optimization speed), and the condition of fitting the embedded target of the combined indication information in the optimization of each parameter value is avoided.

The following description describes the process of classifying the large data set of the factory according to the factory data archiving model, which specifically comprises the following steps:

step 101: and loading the large data set of the factory to be archived into an initial linear processing network in a factory data archiving model, and carrying out overall description vector mining on the large data set of the factory to be archived based on the initial linear processing network to obtain an overall description vector set corresponding to the large data set of the factory to be archived.

Step 102: and carrying out lookup table mapping operation on the whole description vector set through a similarity training mining network to obtain a first surface layer whole description vector corresponding to the large data set of the factory to be archived.

Step 103: and mining a first production situation description vector of the large data set of the factory to be archived through a bottom state information mining network in the factory data archiving model.

Step 104: and carrying out description vector mining on the production situation information in the whole description vector set through a linear transformation module in the bottom layer state information mining network, and carrying out lookup table mapping operation on the mined production situation information through a lookup table module in the bottom layer state information mining network to obtain a first production situation description vector corresponding to the large data set of the factory to be archived.

Step 105: and integrating the description vector of the first surface layer overall description vector with the first production situation description vector through a production situation reasoning network in the factory data archiving model, and then classifying the factory big data set to be archived through the obtained first integration description vector by adopting combined indication information to obtain classification information of the factory big data set to be archived.

The description of the debugging process of the model provided by the embodiment of the application can be referred to as follows. The debugging templates adopted in the debugging are multiple templates, so that a debugging template set is formed, and one debugging template set comprises three debugging templates: i.e. a reference template, an active template and a passive template, wherein the reference template and the active template are the same or similar factory data sets, in other words the reference template and the active template are factory big data set templates with a commonality measurement result being larger than a preset commonality measurement result, the reference template and the passive template are dissimilar factory big data sets, in other words the passive template and the active template are factory big data set templates with a commonality measurement result not being larger than the preset commonality measurement result.

The method and the device can mark the factory large data set template group debugged by the multi-element group, and when the factory large data set similarity debugged templates are marked, three templates meeting the conditions screened in the factory large data set are marked each time to form a multi-element debugging template set. However, the multi-element group data obtained at will has a large number of simple templates (templates which are easy to distinguish), the simple templates are useful for training initially, after debugging, the models are very deficient, so that strong recognition performance is formed on the simple templates, the error of the simple templates is larger than that of the difficult templates (templates which are difficult to distinguish), and therefore, the difficult templates are ignored, and on the basis of the simple templates, the influence of the follow-up acquisition of a large number of difficult template group models is large. Thus, embodiments of the present application divide the acquired factory large dataset templates into a set of debug templates. The process of establishing the debugging template library can concretely refer to the following steps:

Step 201: and constructing an active template group by using every two factory large data set templates with the commonality measurement result larger than the preset commonality measurement result based on the obtained commonality measurement results among the plurality of factory large data set templates.

Step 202: and carrying out template combination through each positive template group to obtain a plurality of debugging template sets for establishing a debugging template library.

That is, the acquired plurality of factory large dataset templates are decomposed into a plurality of active template groups, because both templates in each active template group are the same or similar, one of which may be considered a reference template in the multi-template and the other of which may be considered an active template in the multi-template. However, two templates obtained from different active template groups may be templates with low commonality measurement results, so that template combination can be performed on multiple active template groups, multiple debug template sets are established through one active template group, and a sufficient number of debug template sets can be provided for establishing a debug template library after the multiple active template groups are combined.

As a possible implementation manner, the plurality of debug template sets are established through one active template group, which can be realized through the following steps:

(1) One factory large dataset template in one active template group is determined as a selected template, and one factory large dataset template is respectively determined as a temporary template in the rest template groups.

(2) The distance between the selected template and each of said temporary templates is determined separately.

(3) And arranging the temporary templates according to the corresponding distances in an ascending or descending order, and determining at least one temporary template with the set sequence as a negative template corresponding to the selected template.

The setting order may be to sequentially order the plurality of temporary templates by the following X numbers of the corresponding distance increment or to sequentially order the plurality of temporary templates by the preceding X numbers of the corresponding distance decrement, in other words, to determine a plurality of temporary templates farther from the selected template (vector distance), as the negative template.

(4) And respectively constructing a debugging template set from each determined negative template and each positive template set, wherein a selected template in the positive template set is a positive template in the debugging template set, and the other template in the positive template set is a reference template in the debugging template set.

Based on the implementation process, each passive template can construct a debugging template set with the template group; for one positive template group, X negative templates are determined, and X debug template sets are obtained. If there are X active template groups, x.x debug template sets can be obtained.

For example, for active template set A, the template set includes two factory large dataset templates: a factory large dataset template A and a factory large dataset template B. Setting a large factory data set determining template A as a selected template, and then respectively determining 5 large factory data set templates from an active template group B to an active template group G to be a reference large factory data set, namely: a factory large dataset template C, a factory large dataset template E, a factory large dataset template G, a factory large dataset template I and a factory large dataset template K. And acquiring distances between the factory large data set template A and the 5 reference factory large data sets respectively, wherein if X=3, the distance between the factory large data set template A and the reference factory large data sets is the largest factory large data set template C, the factory large data set template E and the factory large data set template G, and three debugging template sets are established through the 3 temporary templates to obtain (the factory large data set template A, the factory large data set template B, the factory large data set template C), (the factory large data set template A, the factory large data set template B, the factory large data set template E), (the factory large data set template A, the factory large data set template B and the factory large data set template G).

In practical application, the multi-group marking can be optimized to only mark active template groups, so as to obtain similar template groups, for example, the following extraction is adopted in the template groups of each batch (batch) to obtain multi-groups, specifically, the method comprises the steps of selecting templates in one active template group, randomly acquiring a factory large data set from the rest HW1 active template groups as temporary templates, then acquiring the distance between each temporary template and the selected template, selecting the largest M temporary templates as negative templates according to the distance, and forming multi-groups with the active templates of the selected template groups respectively, so that each template obtains M multi-groups, and all batches obtain M multiplied by H multi-groups, wherein H is the number of templates (batch_size) for debugging in a single time, the value of H is relatively larger, and the batch is a super-parameter, and the number of templates which need to be processed before optimizing parameter values is expressed.

The preparation of the debugging template is completed, and the specific debugging process of the factory data archiving model can refer to the following steps:

step 301: obtaining a debugging template library, and determining a debugging template set in the debugging template library.

Step 302: loading the determined debugging template set into a debugged factory data archiving model, and obtaining a third surface layer integral description vector produced by the mining network based on similarity training in the factory data archiving model, a third production situation description vector produced by the mining network of bottom layer state information, and an inference result of the production situation inference network.

Step 303: establishing a temporary error result through the third surface layer integral description vector and the reasoning result, and performing iterative debugging on network parameter values of the bottom layer state information mining network in the factory data archiving model according to a preset optimization round based on the temporary error result.

For example, a first multi-element error result is established through the third surface layer integral description vector and the reasoning result, the indication information error result and the cross entropy error result are combined, the first multi-element error result, the indication information error result and the cross entropy error result are combined, and the temporary error result is obtained through calculation according to a preset calculation mode.

Step 304: establishing a target error result through the third surface layer integral description vector, the third production situation description vector and the reasoning result, performing iterative optimization on network parameters of the factory data archiving model through the target error result until the factory data archiving model meets a preset optimization cut-off condition, and outputting the debugged factory data archiving model.

For example, a first multi-element error result, a second multi-element error result, a combined indication information error result and a cross entropy error result are established through a third surface layer integral description vector, a third production situation description vector and an inference result, and the first multi-element error result, the second multi-element error result, the combined indication information error result and the cross entropy error result are calculated according to a preset calculation mode to obtain a target error result.

In application, f times of iterative optimization can be carried out on all templates of the debugging template library through the steps, all templates are processed once in each iteration, and f is a super parameter and represents the circulation value of the whole debugging template library. For example, each iteration includes: taking H templates of all templates as one round to obtain W rounds, and executing the following steps for each batch:

I: all parameters of the model are configured into a state to be debugged, and a loaded factory big data set is transmitted forward to obtain an inference result.

II: and obtaining a multi-element error, wherein the multi-element error comprises a first multi-element error result, a second multi-element error result, a combined indication information error result and a cross entropy error result, and obtaining a total error result.

III: and adjusting model parameters through a gradient descent strategy.

The debugging of the fully connected unit comprises: (1) The repeated iteration is completed through the weighted summation of three error results; (2) And (5) through weighted summation of four error results, repeating until a preset optimization cut-off condition is met. (2) Optimizing the overall parameters of the model, and (1) optimizing the fully-connected units.

In this embodiment of the present application, the error result acquisition will be described below, and mainly includes three types of error results, a multi-element error result, a cross entropy error result, and a combination instruction information error result. And respectively establishing a first multi-element error result and a second multi-element error result through the surface layer integral description vector and the production situation description vector. In this embodiment of the present application, the target error result is mainly obtained by performing calculation according to a preset calculation mode for the four error results.

As one embodiment, the obtaining of the error result includes:

step 410: and establishing a first multi-element error result through the third surface layer integral description vector corresponding to each debugging template in the debugging template set.

The third surface layer overall description vector is consistent with the information indicated by the first surface layer overall description vector and the second surface layer overall description vector, the third surface layer overall description vector is aimed at a debugging template, and similarly, the third production situation description vector and the reasoning result can be understood as well.

The process of establishing the first multivariate error result is specifically:

step 411: and determining a first Euclidean distance between a third surface layer integral description vector corresponding to the reference template and a third surface layer integral description vector corresponding to the positive template group in the debugging template set and a second Euclidean distance between the third surface layer integral description vector corresponding to the reference template and a third surface layer integral description vector corresponding to the negative template.

Step 412: and summing the difference value of the first Euclidean distance and the second Euclidean distance with a preset reference value to obtain a summation result, determining the summation result as a first target parameter, wherein the preset reference value is used for representing the difference extreme value of the commonality measurement result between the active template and the passive template.

Step 413: and determining the larger parameter of the first target parameter and the candidate parameter as a first multi-component error result.

After the multiple sets are determined in the batch template, a first multiple error result is determined by the surface layer overall description vector of the multiple templates. For example, the multivariate error result may be determined using the following formula:

L1＝max(S1-S2+u，0)

l1 is a first multivariate error result, u is a preset reference value, S1 is a first Euclidean distance between a third surface layer integral description vector corresponding to a reference template and a third surface layer integral description vector corresponding to an active template group; s2 is a second Euclidean distance between the third surface layer integral description vector corresponding to the reference template and the third surface layer integral description vector corresponding to the negative template. Wherein the candidate parameter is equal to 0, if the first target parameter S1-s2+u is greater than 0, the first multivariate error result is the first target parameter, otherwise, if the first target parameter S1-s2+u is less than or equal to 0, the first multivariate error result is equal to 0.

The first multivariate error result acts to cause the distance of the surface global description vector of the reference template and the negative template to be greater than u than the distance of the surface global description vector of the reference template and the positive template. According to the error result, the surface layer integral description vectors of the reference template and the active template are ensured to be close, but the surface layer integral description vectors of the reference template and the passive template are not close, so that the surface layer integral description vectors between the active template and the passive template are ensured to have larger distance, and the classification accuracy is improved.

Step 420: and establishing a second multi-element error result through a third production situation description vector corresponding to each debugging template.

Similarly, the method for establishing the second multivariate error result may refer to establishing the first multivariate error result, including:

step 421: and through a third Euclidean distance between a third production situation description vector corresponding to the reference template and a third production situation description vector corresponding to the positive template set in the debugging template set, a fourth Euclidean distance between the third production situation description vector corresponding to the reference template and a third production situation description vector corresponding to the negative template.

Step 422: and summing the difference value of the third Euclidean distance and the fourth Euclidean distance with a preset reference value to obtain a summation result, determining the summation result as a second target parameter, wherein the third preset reference value is used for representing the difference extreme value of the commonality measurement result between the active template and the passive template.

Step 423: and determining the larger parameter of the second target parameter and the candidate parameter as a second multivariate error result.

The determined formula can refer to an acquisition formula of L1, and the description is omitted herein, the second multi-component error result is used for enabling the distance between the reference template and the production situation description vector of the negative template to be larger than the distance between the reference template and the production situation description vector of the positive template by u, the reference template and the production situation description vector of the positive template can be ensured to be close, the reference template and the production situation description vector of the negative template are not close, accordingly the production situation description vector between the positive template and the negative template is ensured to have a larger distance, and classification accuracy is improved.

Step 430: and establishing a combined indication information error result through the respectively corresponding reasoning result and the corresponding combined indication information vector of each debugging template, wherein the reasoning result represents the reasoning confidence coefficient corresponding to each classification indication information of the debugging template, and the combined indication information vector represents the actual confidence coefficient corresponding to each classification indication information of the debugging template.

The embodiment of the application can determine the combined indication information error result of the combined indication information vector obtained based on the combined indication information mark and the inference result generated by the full connection unit. The full-connection classification debugging is used for limiting classification embedding, so that the full-connection classification debugging comprises classification information, and in addition, because the classification is easy to generate overfitting, the embedding distinguishing capability is poor, a tuning parameter which is an order of magnitude larger than other units can be configured to be used by the full-connection units, so that gradients of the full-connection units are only adopted for one tenth of original gradients in the process of optimizing back propagation to state information embedding, and fitting is prevented.

Step 440: and obtaining a cross entropy error result through the reasoning result corresponding to each debugging template.

Step 450: and calculating the first multi-element error result, the second multi-element error result, the combined indication information error result and the cross entropy error result according to a preset calculation mode to obtain a target error result.

For example, the respective error results are weighted and summed to obtain the target error result.

In summary, the data archiving method and AI system based on the digital factory provided in the embodiments of the present application obtain a model capable of performing two-dimensional training based on a similarity training mining network for mining surface layer overall description vectors and a bottom state information mining network for mining production situation description vectors, and in the process of archiving a factory big data set to be archived by adopting the factory data archiving model provided in the embodiments of the present application, expression description can be performed on the factory big data based on combining surface layer description vectors and production situation description vectors, so that model performance after collaborative debugging is improved, description vector mining efficiency is increased, and factory big data set classification can be performed through the obtained factory big data set integration description vectors, so that accurate and reliable classification information can be obtained, so as to increase accuracy, reliability and speed of factory big data set archiving.

Based on the same principle as the method shown in fig. 1, there is also provided in an embodiment of the present application a data archiving device 10, as shown in fig. 2, the device 10 comprising:

a factory data acquisition module 11 for acquiring a factory large data set to be archived in response to a data archiving instruction;

The surface layer vector mining module 12 is used for mining the first surface layer overall description vector of the large data set of the factory to be archived through a similarity training mining network in the factory data archiving model which is debugged in advance; wherein the first surface layer bulk description vector indicates a bulk initial information representation of the large data set of the plant to be archived;

a situation vector mining module 13, configured to mine a first production situation description vector of the large data set of the plant to be archived through a bottom state information mining network in the plant data archiving model; the first production situation description vector indicates production situation information representation of the large data set of the factory to be archived;

the production situation classifying module 14 is configured to integrate the first surface layer overall description vector and the first production situation description vector through a production situation reasoning network in the factory data archiving model, and then classify the factory big dataset to be archived through the obtained first integrated description vector by adopting combined indication information to obtain classification information of the factory big dataset to be archived; the factory data archiving model is obtained by adopting collaborative debugging on the basis of the similarity training mining network and the bottom layer state information mining network;

And the factory data archiving module 15 is used for archiving the factory large data set to be archived through the classification information of the factory large data set to be archived.

The above embodiment describes the data archiving apparatus 10 from the viewpoint of a virtual module, and the following describes a data archiving AI system from the viewpoint of a physical module, specifically as follows:

the present embodiment provides a data archiving AI system, as shown in fig. 3, the data archiving AI system 100 includes: a processor 101 and a memory 103. Wherein the processor 101 is coupled to the memory 103, such as via bus 102. Optionally, the data archiving AI system 100 can also include a transceiver 104. It should be noted that, in practical applications, the transceiver 104 is not limited to one, and the structure of the data archiving AI system 100 is not limited to the embodiment of the present application.

The processor 101 may be a CPU, general purpose processor, GPU, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor 101 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

Bus 102 may include a path to transfer information between the aforementioned components. Bus 102 may be a PCI bus or an EISA bus, etc. The bus 102 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, only one thick line is shown in fig. 3, but not only one bus or one type of bus.

Memory 103 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disks, laser disks, optical disks, digital versatile disks, blu-ray disks, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 103 is used for storing application program codes for executing the present application and is controlled to be executed by the processor 101. The processor 101 is configured to execute application code stored in the memory 103 to implement what is shown in any of the method embodiments described above.

The embodiment of the application provides a data archiving AI system, which comprises: one or more processors; a memory; one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, and wherein the one or more programs, when executed by the processor, XXX. According to the technical scheme, the model capable of performing two-dimensional training is obtained based on the similarity training mining network for mining the surface layer overall description vector and the bottom state information mining network for mining the production situation description vector, and the factory large data can be expressed and described based on the surface layer description vector and the production situation description vector in the process of archiving the factory large data set to be archived by adopting the factory data archiving model provided by the embodiment of the application, so that the model performance after collaborative debugging is improved, the description vector mining efficiency is improved, the factory large data set is classified through the obtained factory large data set integration description vector, and accurate and reliable classification information can be obtained, so that the accuracy, reliability and speed of archiving the factory large data set are increased.

Embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, which when executed on a processor, enables the processor to perform the corresponding content of the foregoing method embodiments.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A data archiving method based on a digital factory, characterized by being applied to a data archiving AI system, the method comprising:

responding to the data archiving instruction, and acquiring a large data set of a factory to be archived;

2. The method of claim 1, wherein the classifying the large data set of the plant to be archived by the resulting first integrated description vector takes the combined indication information, further comprising:

3. The method of claim 1, wherein the plant data archiving model further comprises an initial linear processing network;

4. A method according to claim 3, wherein the initial linear processing network comprises a plurality of linear transformation modules;

5. The method of claim 1, wherein the debugging process of the plant data archiving model comprises the steps of:

6. The method of claim 5, wherein each set of debug templates comprises three debug templates, a reference template, an active template, and a passive template, respectively;

and determining the larger parameter of the first target parameter and the candidate parameter as the first multi-component error result.

7. The method of claim 5, wherein each set of debug templates comprises three debug templates, a reference template, an active template, and a passive template, respectively;

8. The method of claim 5, wherein prior to iteratively optimizing network parameters of the plant data archiving model by the target error result, the method further comprises:

9. The method according to any one of claims 1-8, wherein the set of debug templates in the library of debug templates is obtained based on the steps of:

10. A data archiving AI system comprising a processor and a memory, the memory storing a computer program, the processor configured to execute the computer program to implement the method of any one of claims 1-9.