CN114625901B - Multi-algorithm integration method and device - Google Patents

Multi-algorithm integration method and device Download PDF

Info

Publication number
CN114625901B
CN114625901B CN202210519444.0A CN202210519444A CN114625901B CN 114625901 B CN114625901 B CN 114625901B CN 202210519444 A CN202210519444 A CN 202210519444A CN 114625901 B CN114625901 B CN 114625901B
Authority
CN
China
Prior art keywords
service
algorithms
data
algorithm
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210519444.0A
Other languages
Chinese (zh)
Other versions
CN114625901A (en
Inventor
王冲
唐建松
张晟辉
张犇
朱云
何远峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Dimension Software Co ltd
Original Assignee
Nanjing Dimension Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Dimension Software Co ltd filed Critical Nanjing Dimension Software Co ltd
Priority to CN202210519444.0A priority Critical patent/CN114625901B/en
Publication of CN114625901A publication Critical patent/CN114625901A/en
Application granted granted Critical
Publication of CN114625901B publication Critical patent/CN114625901B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-algorithm integration method and a device, wherein the method comprises the following steps: collecting registration information of a plurality of service algorithms and registering; integrating interfaces of a plurality of registered service algorithms; calling a plurality of registered databases of service algorithms and receiving data of each database; extracting the characteristics of the data of each database; performing association, comparison and identification processing on the extracted features, and performing classified storage according to feature types; receiving a service task request; determining at least two optimal service algorithms from the registered service algorithms according to the service task request; executing the service tasks through the at least two optimal service algorithms, and integrating the execution results of the executed service tasks; according to the method, the multiple algorithms are integrated and compared, and the service task request is optimally matched with the algorithms, so that the efficient integration of the multiple algorithms is realized, the algorithm calling speed is increased, and the service task execution efficiency and accuracy are higher.

Description

Multi-algorithm integration method and device
Technical Field
The invention relates to the technical field of computers, in particular to a multi-algorithm integration method and device.
Background
With the continuous development of image processing technology, video image information systems are increasingly used in public security, customs and other inspections. In the process of computer deployment and personnel merging, a corresponding system is required to provide multiple services such as image analysis, image deployment and control, image retrieval and the like, so that multiple service algorithms provided by multiple manufacturers are required to be adopted. Due to the fact that technical implementation modes of different manufacturers have certain differences, and algorithm using modes have various differences, integration of multi-algorithm universality is needed in the process of multi-algorithm design of a video image information system, so that the sampling inspection work can be finished more smoothly in the working process.
At present, various algorithms have been integrated. For example, patent document CN112988384A discloses a scene-based algorithm resource automatic integration calling method, which performs pre-calibration of algorithm capability, performs uniform format encapsulation on an algorithm model of input algorithm resources in a form of a callable interface, deploys and opens a received algorithm interface and verifies an operation mechanism thereof, so as to match an optimal algorithm interface for a user.
According to the scheme, the tasks are distributed, and various algorithm resources are integrally called, so that the most suitable algorithm interface is automatically evaluated and recommended, the interface formats of the similar algorithms are unified, and the time for a user to select the algorithm interface is saved. However, the method only selects the algorithm which is most matched with the task through the algorithm capability, and is poor in matching accuracy and low in efficiency.
Disclosure of Invention
The invention provides a multi-algorithm integration method and a multi-algorithm integration device, which are used for realizing high-efficiency integration of various algorithms by integrating and comparing various algorithms and optimally matching service task requests with the algorithms, improving the algorithm calling speed and realizing higher service task execution efficiency and higher accuracy.
A method of multi-algorithm integration, comprising:
collecting registration information of a plurality of service algorithms and registering;
integrating interfaces of a plurality of registered service algorithms;
calling a plurality of registered databases of service algorithms and receiving data of each database;
extracting the characteristics of the data of each database;
performing association, comparison and identification processing on the extracted features, and performing classified storage according to feature types;
receiving a service task request;
determining at least two optimal service algorithms from the registered service algorithms according to the service task request;
and executing the service tasks through the at least two optimal service algorithms, and integrating the execution results of the executed service tasks.
Further, interface integration is performed on the registered service algorithms, and the method comprises the following steps:
and carrying out unified assignment on the interface parameters and the return result formats of all the service algorithms to form a universal interface.
Further, the service algorithm includes a classification algorithm, a regression algorithm and a clustering algorithm.
Further, determining at least two optimal service algorithms from the registered plurality of service algorithms according to the service task request, comprising:
analyzing the service task request to obtain a setting parameter related to the service task;
calling at least three service algorithms of different types, inputting the setting parameters into the service algorithms of different types, and obtaining output result parameters;
and verifying the output result parameters, and determining at least two optimal service algorithms according to the verification result.
Further, verifying the output result parameters, and determining at least two optimal service algorithms according to the verification result, including:
respectively establishing N data sets according to the N output result parameters, wherein N is an integer greater than or equal to 3;
in each round of verification, one data set is selected from the data sets to serve as verification data, the other data sets serve as training data, the training data are input into a service algorithm to be trained, then the verification data are input to be verified, the mean square error is calculated, and the mean square error average value of the service algorithm is calculated after N rounds of verification;
performing the verification on each service algorithm to obtain a mean square error average value of each service algorithm, and sequencing the mean square error average values in an ascending order according to the mean square error average values of the service algorithms;
and selecting a preset number of service algorithms which are ranked in the front to determine as the optimal service algorithm, wherein the preset number is more than or equal to two.
Further, the mean square error is calculated by the following formula:
Figure 935740DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 210339DEST_PATH_IMAGE002
m is the total number of the result data and the verification data output by training, r is the grouping number of the result data and the verification data output by training, M-r is the degree of freedom,
Figure 348059DEST_PATH_IMAGE003
for the (i) th sample,
Figure 685631DEST_PATH_IMAGE004
sample variance generated for each group
The mean square error average is calculated according to the following formula:
Figure 377643DEST_PATH_IMAGE005
wherein E is the mean square error, N is the number of verifications,
Figure 52338DEST_PATH_IMAGE002
the mean square error obtained in the verification of the ith round.
Further, integrating the execution result of executing the service task includes:
and carrying out weighted average or simple average on the execution result to obtain an integrated result.
Further, integrating the execution result of executing the service task includes:
and carrying out classification voting on the execution result to obtain an integration result.
A multi-algorithm integration device applied to the method comprises the following steps:
the registration module is used for acquiring registration information of a plurality of service algorithms and registering;
the interface integration module is used for integrating the interfaces of the registered service algorithms;
the database calling module is used for calling the registered databases of the plurality of service algorithms and receiving the data of each database;
the database feature extraction module is used for extracting features of data of each database;
the database integration module is used for performing association, comparison and identification processing on the extracted features and performing classified storage according to feature types;
the receiving module is used for receiving a service task request;
the determining module is used for determining at least two optimal service algorithms from the registered service algorithms according to the service task request;
and the result integration module is used for executing the service tasks through the at least two optimal service algorithms and integrating the execution results of the executed service tasks.
An electronic device comprises a processor and a storage device, wherein the storage device stores a plurality of instructions, and the processor is used for reading the instructions and executing the method.
The multi-algorithm integration method and the device provided by the invention at least have the following beneficial effects:
(1) the interfaces of a plurality of service algorithms and the related databases are integrated to form external interfaces with uniform formats and databases which are more convenient to call and compare, so that the running speeds of the plurality of algorithms are increased, and the execution efficiency of service tasks is improved.
(2) The optimal algorithms can be matched from various algorithms for the service task request, the obtained result generalization capability is better, more stable and comprehensive, the error of algorithm execution is minimized, and the accuracy of task execution is improved.
(3) The algorithm results are integrated through a voting method and an averaging method, and the accuracy of the service execution results is improved.
Drawings
Fig. 1 is a flowchart of an embodiment of a multi-algorithm integration method provided in the present invention.
Fig. 2 is a flowchart of an embodiment of a method for verifying an output result parameter in the method provided by the present invention.
Fig. 3 is a schematic structural diagram of an embodiment of a multi-algorithm integration apparatus provided in the present invention.
Fig. 4 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention.
Reference numerals: 1-a processor, 101-a registration module, 102-an interface integration module, 103-a database calling module, 104-a database feature extraction module, 105-a database integration module, 106-a receiving module, 107-a determination module, 108-a result integration module and 2-a storage device.
Detailed Description
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Referring to fig. 1, in some embodiments, there is provided a multi-algorithm integration method comprising:
s1, collecting and registering registration information of a plurality of service algorithms;
s2, integrating interfaces of a plurality of registered service algorithms;
s3, calling the databases of the registered service algorithms and receiving the data of each database;
s4, extracting the characteristics of the data of each database;
s5, performing association, comparison and identification processing on the extracted features, and performing classified storage according to feature types;
s6, receiving a service task request;
s7, determining at least two optimal service algorithms from the registered service algorithms according to the service task request;
and S8, executing the service tasks through the at least two optimal service algorithms, and integrating the execution results of the executed service tasks.
Specifically, in some embodiments, for example, in the field of deployment and control, in step S1, the collected multiple service algorithms are such that the algorithm platform has the following service capabilities: the system comprises an image structured analysis service, an image control service, an image retrieval service, an image scheduling service and an image autonomous comparison service.
The image structured analysis service refers to the on-line image stream analysis capability, namely the structured analysis processing capability of the image stream imported in real time, and specifically comprises the analysis of human faces, human bodies and other accessories, so that the human face track structured data meeting the specification is formed; the image deployment and control service refers to the utilization of image similarity comparison capability, namely the capability of realizing real-time deployment and control comparison service based on images, also supports online comparison deployment and control service for images of human bodies and the like of human faces, and realizes deployment and control early warning capability by constructing various deployment and control object libraries in different ranges and comparing the characteristics of personnel objects extracted from online image streams with the characteristics of a specified deployment and control object library (such as an escaping personnel library and a key object library); the image retrieval service is to realize the retrieval service based on image similarity comparison by utilizing the extraction and calculation capabilities of image features, and return results according to the set image similarity and other retrieval conditions (time, space, element attributes and the like); the image scheduling service is based on various image storage environments processed by a video image analysis platform, builds retrieval service capability of various images (including local small images and overall background images), and comprises image retrieval based on identification such as access addresses, image IDs and the like, and supports the inquiry and retrieval of batch images through time ranges and point locations; the image autonomous comparison service refers to comparing the similarity of a specified image or image set based on the image similarity comparison capability, giving a result according with the similarity, and specifically supporting multiple comparison modes such as 1:1, 1: q, q: q and the like.
The 1:1 mode is to compare the similarity of two designated images and determine whether the images point to the same object. If the certificate photo and the field snapshot photo are compared, the identity verification of the checking personnel is completed; 1: the q-mode is a comparison of a given image with a particular image library to find out which images have a given similarity in the image library. If the picture of the checking personnel is compared with the key object library, whether the checking personnel is the key object is judged; the q mode refers to comparing two image sets to find out a coincident object with close similarity so as to achieve the purpose of data intersection set analysis. Such as comparing sets of photographs of suspect objects appearing in two different cases for case string-and-parallel analysis.
The registration information of the plurality of service algorithms comprises detailed information of each algorithm, corresponding use modes, technical specifications and other information, and specifically comprises algorithm basic information, algorithm monitoring analysis, algorithm heartbeat monitoring records, algorithm use log records and algorithm data reconciliation.
The basic algorithm information comprises: source algorithm protocol (e.g., webservice), return result format (e.g., xml), execution mode (e.g., post), source interface address, algorithm authorization password information, algorithm providing unit, algorithm provider contact, source algorithm usage description (e.g., algorithm using API document), monitoring status (e.g., monitoring once every M minutes, no return result beyond K seconds is considered as no response exception), affiliation service.
The algorithm monitoring analysis is specifically the display of the daily running condition of the algorithm. As a better implementation mode, a monitoring and analyzing condition display report is provided and consists of a two-dimensional coordinate system, wherein the x axis displays the date of the last 30 days, and the y axis displays the normal or abnormal condition of the day; the upper right corner of the report provides two choices of time or times for the y axis for switching, and the time is displayed in a default state; the y-axis selects two columns, one showing time to failure (in hours) and the other showing time to normality (in hours).
The algorithm heartbeat monitoring record comprises the following steps: query conditions, list presentation, and list content. The query conditions comprise monitoring time parameters and monitoring results which are selected according to the range; the list display comprises a sequence number, monitoring time, response speed (unit is millisecond) and monitoring results, wherein the monitoring results comprise three types of no response, slow response speed and normal response.
The algorithm uses the log record to include: query conditions, list presentation, and list content. The query conditions comprise use time, return result time, use units and use IP which are selected according to the range; the tabular presentation includes sequence number, time of use, return speed, use of IP, unit of use, and request parameter details.
The algorithm data reconciliation comprises the following steps: query conditions, list presentation, and list content. The query conditions comprise account checking time and account checking results selected according to the range, and the account checking results comprise three types of normal data, lost data and abnormal increase; the list display comprises a serial number, account checking time, an access data type, an access data volume, an output data volume and a docking result.
In step S2, the interface integration of the registered multiple service algorithms includes:
and carrying out unified assignment on the interface parameters and the return result formats of all the service algorithms to form a universal interface.
As a better implementation mode, based on a uniform GA/T1400 service interface specification, the specific use parameters and the return result formats of the algorithm interfaces of different manufacturers are subjected to universal integration to form a uniform assignment standard and a universal interface. A universal rest framework is supported to realize a scheduling interface, and two interface checking mechanisms are provided to ensure timeliness and accuracy of service. Firstly, an overtime control mode is adopted, and in order to avoid the influence on user experience caused by that no feedback is obtained for a long time when a service interface is dispatched, the task dispatching of the interface which does not return a result for more than 5 seconds is automatically terminated; and secondly, a repeated request mode is adopted, repeated scheduling is carried out after the response of a single service interface fails in order to avoid response faults caused by factors such as network instability, and the response rate of the interface is improved through repeated cyclic scheduling.
For various image analysis interfaces which need to be used online, such as face retrieval, face similarity comparison and the like, and need to process real-time data or respond to requests in real time, algorithm API interfaces of different manufacturers are subjected to standardized conversion on the basis of design depending on relevant specifications of a view library to form a universal interface. Specifically, the restful interface specification is used as a using mode of the universal interface, standardized conversion can be carried out on various HTTP interfaces and development kits based on SDK of different manufacturers, and the universal interface is formed and applied to various service scenes.
The relevant specification interfaces include a common class interface, a collection class interface and a service class interface. The public interfaces comprise four interfaces of registration keeping, keep-alive, cancellation and timing; the collection interfaces comprise interfaces for uploading video clips, images, files, human faces, motor vehicles, non-motor vehicles, articles, scenes and the like; the service interface comprises interfaces for inquiring and maintaining video clips, inquiring and maintaining images, inquiring and maintaining files, inquiring and maintaining faces, inquiring and maintaining motor vehicles, inquiring and maintaining non-motor vehicles, inquiring and maintaining articles, inquiring and maintaining scenes, inquiring and maintaining control tasks, inquiring and maintaining alarm information, inquiring and maintaining subscription records, inquiring and maintaining notification records and the like.
The relevant service specification mode is as follows: the interface message Content-Type header field should be set to application/+ JSON. The returned result item of the GET method is a result object returned when the query is successful (i.e. the HTTP response status code is 2 XX). When the query is unsuccessful (i.e., the HTTP response status code is not 2 XX), the result object returned is ResponseStatus. In one specific application scenario, the canonical service interface is seen in table 1.
TABLE 1
Figure 44565DEST_PATH_IMAGE006
The definitions of Register and ResponseStatus should meet the specification in GA/T1400.3 protocol, where ID of ResponseStatus is DeviceID requesting registration, StatusCode is operation response code of this registration, StatusString is operation response description of this registration, LocalTime is system time of the registered party, and may be used for timing of the registered party.
In step S3, a plurality of databases of registered service algorithms are called, and data of each database is received, thereby preparing for database integration. The database integration is a data standardization process and supports real-time calculation, off-line calculation and batch processing operation, and the data transmission process supports a distributed data transmission mode. In the data processing process, an artificial intelligence technology, such as a graph calculation technology and a memory calculation technology, is introduced, so that the processing of structured data and unstructured data is realized, and the value of the data is improved. In the data processing process, a model system, a label system and a knowledge map technology are introduced, so that the value density of data is further improved, and data value increment, data preparation and data abstraction are realized for data intelligent application. For various situations that need to rely on a designated database for algorithm processing, such as a database used by an image organization and control comparison service, a database used by a person merging and the like (e.g. an Oracle library and an Hbase library), two different ways of view conversion and database docking are designed and provided. The view conversion mode is to convert a general database into a corresponding view structure according to the use requirement of the algorithm and provide the view structure for the algorithm to use; the database docking mode is to exchange the analysis result output by the algorithm from the database of the algorithm to a general database in a database docking mode, and to complete the conversion of the structural specification in the exchange process.
In step S4, the feature extraction process is to extract features such as names, characters, pictures, id cards, mobile phone numbers, key frames, facial features, fingerprint features, voiceprint features, iris features, and the like associated with the algorithm processing from the structures such as full text structured data, web page information, multimedia information, biological features, and the like included in the data of each database. After step S3, filtering and cleaning the extracted features to obtain data with higher quality, specifically including the following steps: unifying data formats, removing repeated errors and associated errors, correcting content errors and logic errors, correcting data inconsistency, splitting content, supplementing missing data and the like.
In step S5, association, comparison and identification processing are performed on the cleaned data features, data are output according to data standards by classification, and data storage is performed according to certain storage specifications and storage strategies. Specifically, the association includes associating internet site data, fixed network data, mobile internet data, and the like; the comparison comprises the comparison of keywords, texts, voice images, binary systems, structuralization and the like; the identification includes marking the business, region, national language, spatial location, sensitivity level, etc. The processed database data are classified according to the characteristic types and stored in databases such as a resource library, a subject library, a special subject library and the like.
As a preferred implementation mode, the database integration adopts a database integration platform. The database integration platform supports automatic data processing and checking functions according to strategies and rules, adopts a dynamic, configurable and extensible open architecture, supports dynamic arrangement in a data preprocessing link, establishes a unified coordination mechanism for processing data flow and control flow, realizes unified addressing management of a data life cycle, and ensures the integrity and consistency of data. The data caching mechanism is used for processing instantaneous peak data streams, and the processing capacity of structured and unstructured data such as multimedia, text and encrypted files is achieved. And the database integration platform processes the structured data, the semi-structured data and the unstructured data according to the data standard, the data verification rule and the data processing strategy. The database integration platform fully considers the characteristics of data, gives consideration to the variety diversity, the mass property of data quantity, the multi-source heterogeneity of the data, the complexity of data formats and the online timeliness of the data, and comprehensively constructs a data resource fusion system with all-dimensional acquisition, all-network convergence and all-dimensional integration. By taking intelligent application as a guide and data processing automation and intellectualization, the data association degree and the service compactness are improved, the data quality is improved, the potential and the value of data resources are mined, a mass data resource pool is established by scientific classification, a basis is made for data organization and storage, and the actual combat application of each service department is effectively supported.
Specifically, the database integration platform supports real-time streaming data processing, offline data calculation and distributed data management. The data storage supports data types such as a relational database, a column cluster database, a graph database, a text file, a binary file, a video format, an audio format, a picture format, a large object, serialized data, XML, JSON, a general machine learning model, a statistical analysis model and the like; the data integration platform supports real-time marking and off-line marking functions of processed data and supports label engineering and knowledge map technology; the database integration platform supports generation of an analysis report for data quality conditions of processed data, the quality of the processed data passes through a data quality evaluation model, quality grading is carried out on the data, and the analysis report comprises a data consistency report, a data integrity report and a data credibility report; the data integration basic platform records detailed data processing operation logs, supports log records of an operation level, a service level and a system level, and supports auditing, system maintenance, tuning analysis, problem tracking and the like of the operation logs; the data integration basic platform supports automatic processing and manual processing of problem data so as to analyze the reason of unqualified data, solve the problem of unqualified data, improve the quality of accessed data, support additional recording of problem data and repair and reuse of problem data; the data integration basic platform system needs to support the monitoring of the running state of the processed data, the statistical analysis of the data and the quality monitoring.
In step S7, determining at least two optimal service algorithms from the registered plurality of service algorithms according to the service task request, including:
s71, analyzing the service task request to obtain the setting parameters of the service task;
s72, calling at least three service algorithms of different types, inputting the setting parameters into the service algorithms of different types, and obtaining output result parameters;
and S73, verifying the output result parameters, and determining at least two optimal service algorithms according to the verification result.
In step S71, most algorithms need to choose to set many parameters to help us control the behavior of the algorithm while maximizing the platform performance.
Some learning algorithms make certain assumptions about the structure of the data or the expected outcome in step S72, and if a desired algorithm type is found, a more useful outcome, more accurate prediction, or faster settling time may be provided. Referring to table 2, when comparing and selecting the classification, regression, and clustering series type algorithms, the most important features such as accuracy, setting time, linearity, number of main parameters, etc. of the algorithms are mainly evaluated.
TABLE 2
Figure 809828DEST_PATH_IMAGE007
The following describes determining an optimal service algorithm in a specific application scenario. For example, for a task of extracting human face features, setting parameters of the task are extracted, wherein the setting parameters comprise a loss function, a learning rate, a kernel function, a smooth curve function, a weighting category, a penalty coefficient, iteration times and the like, a HoG algorithm, a Dlib algorithm and a convolutional neural network feature extraction algorithm are called, the setting parameters are input, the algorithms are operated, a result parameter list is output, the result parameters are output for verification, and at least two optimal service algorithms are determined according to verification results.
Referring to fig. 2, in step S73, verifying the output result parameter, and determining an optimal service algorithm according to the verification result, the method includes:
s731, respectively establishing N data sets according to the N output result parameters, wherein N is an integer greater than or equal to 3;
s732, in each round of verification, selecting one data set from the data sets as verification data, using other data sets as training data, inputting the training data into a service algorithm for training, then inputting the verification data for verification and calculating a mean square error, and calculating a mean square error average value of the service algorithm after N rounds of verification;
s733, performing the verification on each service algorithm to obtain a mean square error average value of each service algorithm, and sequencing the mean square error average values in an ascending order according to the mean square error average values of the service algorithms;
s734, selecting a preset number of service algorithms which are ranked in the top to determine as an optimal service algorithm, wherein the preset number is greater than or equal to two.
In step S732, the mean square error is calculated by the following formula:
Figure 785874DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 405206DEST_PATH_IMAGE002
m is the total number of the result data and the verification data output by training, r is the grouping number of the result data and the verification data output by training, M-r is the degree of freedom,
Figure 986360DEST_PATH_IMAGE003
for the (i) th sample,
Figure 259209DEST_PATH_IMAGE004
sample variance generated for each group:
Figure 660235DEST_PATH_IMAGE009
wherein the content of the first and second substances,
Figure 673803DEST_PATH_IMAGE010
representing a random variable;
the mean square error average is calculated according to the following formula:
Figure 109463DEST_PATH_IMAGE011
wherein E is the mean square error, N is the number of verifications,
Figure 818793DEST_PATH_IMAGE002
the mean square error obtained in the verification of the ith round.
The above method aims at assessing how generalized the generalization ability of a given algorithm is trained on a particular data set, and by observing the difference in accuracy in different rounds, it is possible to learn the worst and best performance of the model when the algorithm is applied to new data. The obtained result is more stable and comprehensive.
In step S8, a service task is executed by the at least two optimal service algorithms.
The following is further described by specific application scenarios.
In a specific application scenario, three algorithms, namely CLS (a kind of decision tree), DET (a kind of target detection algorithm) and REC (font recognition convolutional neural network), are adopted in OCR (optical text recognition). CLS is a decision tree model training method, and a method and a device for determining data attributes in OCR results, wherein the decision tree model training method comprises the following steps: acquiring a sample picture and performing OCR recognition on the sample picture to generate a first OCR recognition result, wherein the first OCR recognition result is a two-dimensional character string array, and each row of data in the two-dimensional character string array is used for indicating data belonging to the same attribute row; extracting first characteristic information of each data in the first OCR recognition result; acquiring first labeling data corresponding to each data in the first OCR recognition result, wherein the first labeling data are used for indicating the attribute of each data; and training according to the first characteristic information and the first marking data to generate a decision tree model for determining data attributes in the OCR recognition result. The method realizes automatic marking of data attributes in the identification result, effectively reduces consumption cost in the identification process of the picture to be identified, and improves identification efficiency; DET is a target detection algorithm, a picture is input, the output of a model needs to circle the positions of all characters in the picture and the categories of the characters, then visual features related to candidate regions are extracted, and finally a classifier is used for identifying and detecting whether pixel points in the region range form the characters or not; the REC is used for specifically recognizing characters in the area, predicting characters in the corresponding area according to the trained model, and the algorithm is also a core algorithm in the OCR function.
The cooperation mode of the algorithms comprises three modes of concurrency, primary and secondary and division of labor, a user can allocate specific amount of resources to each algorithm in the concurrency mode and the primary and secondary modes, and the allocation mode comprises three modes of regional allocation, point allocation and random allocation.
The concurrent cooperation mode refers to that a plurality of different algorithms are used for the same task at the same time, each algorithm respectively processes resources of different data sources and then returns a result, and the tasks perform unified result summarization. By the method, the established algorithm achievements and the subsequent newly-established algorithms can be fully utilized, and potential risks in analysis reliability and accuracy caused by a single algorithm are avoided. The multiple algorithms can also be verified in a cooperation mode, generally, a data analysis result completed by a first algorithm is handed to a second algorithm for secondary result verification, whether the processing of the same picture is different or not is checked, and whether the analysis result is reliable or not is judged by comparing the difference of the two calculation results.
The region allocation means selecting according to each level of administrative division; point location selection refers to fuzzy retrieval according to each point location; the random distribution refers to the random dynamic adjustment of the performance of each algorithm process according to the pressure magnitude of the data process.
In step S8, the integrating the execution result of the service task includes:
carrying out weighted average or simple average on the execution results to obtain an integration result;
and carrying out classification voting or classification probability voting on the execution result to obtain an integration result.
The method integrates the execution result of the execution service task, and comprises two different modes from the aspect of receiving: firstly, a kafka message service channel is constructed and opened to a plurality of algorithm manufacturers, the algorithm manufacturers actively push the returned result to the service channel, and at the moment, the system can sense the newly fed back service result in real time and can process the newly fed back service result in time; the other is that the manufacturer provides an output service interface or a database of the algorithm result, and the system scans the service interface or the database of the manufacturer in a timing polling mode to judge whether a new result is generated.
Specifically, for the regression problem result, a simple average method is adopted for the prediction results of various algorithm models, so that the obtained result can reduce overfitting, the boundary is smoother, and the problem that the boundary of a single model is rough is avoided. The results of the algorithm calculation are generally set, and the results of the execution of a certain classification task refer to table 3.
TABLE 3
Figure 238273DEST_PATH_IMAGE012
Algorithm a = [0,1,0,1,1,0,0 ];
algorithm B = [0,0,1,0,1,0,0 ];
algorithm C = [0,1,1,1,1,0,1 ];
in the results, 0 indicates the type α, and 1 indicates the type β, and there is no difference in superiority. The result set of the algorithm A, B, C represents the predicted results for the corresponding sample from left to right, and the results are integrated as follows:
simple averaging:
Figure 58462DEST_PATH_IMAGE013
Figure 20733DEST_PATH_IMAGE014
Figure 900964DEST_PATH_IMAGE015
Figure 279511DEST_PATH_IMAGE016
Figure 372232DEST_PATH_IMAGE017
Figure 782485DEST_PATH_IMAGE018
namely, it is
Figure 833618DEST_PATH_IMAGE019
Figure 493269DEST_PATH_IMAGE020
Wherein S represents the integration result, alpha and beta represent the category, A, B, C represents the result set under different algorithms,
Figure 655260DEST_PATH_IMAGE021
representing the probability of the result set of the category alpha under different algorithms.
Weighted average:
accuracy of algorithm
Figure 388861DEST_PATH_IMAGE022
Calculating the algorithm weight:
Figure 280069DEST_PATH_IMAGE023
Figure 364700DEST_PATH_IMAGE024
Figure 330382DEST_PATH_IMAGE025
where TP indicates that a Positive determination is made and the determination is correct, FP indicates that a Positive determination is made, but TN that the determination is wrong indicates that a Negative determination is made and the determination is correct, and FN indicates that a Negative determination is made but the determination is wrong.
Further, the classification voting is to use the output of each algorithm as input, convert the one-dimensional result into N samples in a two-dimensional feature space by using a KNN (nearest neighbor node algorithm), calculate the distances from the test samples to other sample points, sort each distance, select K points with the smallest distance, compare the categories to which the K points belong, and classify the test sample points into the category with the highest ratio among the K points according to the principle that a minority obeys majority.
Most of the K most similar samples in the feature space belong to a certain class, and the KNN algorithm is suitable for automatic classification of class domains with large sample capacity and is suitable for classification voting.
The portion of the KNN algorithm responsible for implementing classification is straightforward, but two points of the algorithm are not easily determined, just from its name: one is how to determine "K" and the other is how to determine "NN". The 'same-class attraction' is a guiding idea of the KNN classification algorithm, so that the machine learning model can be separated from the dependence on the deviation and also has a classification effect. The actual sample has many dimensions, and the distribution of sample data points is different from different dimensions. Assuming that 2 of the 4 dimensions are arbitrarily taken as the X-axis and Y-axis coordinates of the image at a time, 16 images will be obtained.
It can be seen that, for the same sample, after different dimensions are selected, a more complex relationship of canine-crossing is presented between classes, but the tendency of "clustering" of the same class becomes less obvious, the distribution range of samples in the class is wider, and the possibility of being mixed with samples of other classes becomes higher. For KNN, the classification is determined by distance. Specifically, each data point can be made according to the value of each dimension of the sample, and only the distance between each data point and each point needs to be measured, and then if a certain point stroke is classified, only the point needs to be taken as the center of a circle, and then the points adjacent to the point can be found, so that the class is formed. Only points within the circle have a voting weight on which class this point belongs, rather than being voted on by the entire sample. The adjustable parameters of different algorithm models are different. In the KNN algorithm, the selected points, namely the K in the KNN, are parameters which need to be adjusted according to actual conditions so as to obtain better fitting effect, and can be set by combining working experience, wherein the value of the K is generally 3-10.
In a specific application scenario, assuming that there are three independent models, each with 70% accuracy, voting in a minority majority-compliant manner, the final accuracy will be:
Figure 449647DEST_PATH_IMAGE026
(ii) a Namely, the result is subjected to simple classification voting, and the accuracy is improved by 8%. This is a simple probabilistic problem, and if the more the voting algorithm results, the better the result will be, but the precondition is that the algorithm models are independent of each otherThere was no correlation between the results. The more similar algorithm models are integrated, the poorer the integration effect is; the larger the correlation difference between the algorithm models is, the better the integration result will be, and the characteristic will not be affected by the integration mode.
The method also comprises the step of checking the operation condition of the algorithm by adopting the algorithm monitoring and data reconciliation modes. Algorithm monitoring refers to providing monitoring capability of algorithm processing conditions for each algorithm in use, and comprises functions of algorithm current state, algorithm processing flow monitoring and the like, and timely alarming is performed on abnormal algorithms, for example, the abnormal algorithms are checked for heartbeat monitoring conditions, and the functions comprise serial numbers, monitoring time, response speed (unit millisecond), monitoring results (no response, slow response speed and normal response). The data packet of the heartbeat monitoring is from simulation data generated by existing data, the data is sent to the trained model, and the model can predict and return results in real time.
The data reconciliation refers to the process of checking and verifying the number of data, the size of the data and the data fingerprint in the data exchange process of a data provider and a data access party, and comprises the steps of checking the reconciliation of the access number of the algorithms and the data quantity analyzed and output according to the running logs of each algorithm and checking whether the algorithm has data omission or not. After account checking is finished, account checking is required, and logs are required to be recorded when account checking is abnormal. And dividing the data reconciliation into a data access reconciliation and a data distribution reconciliation according to the scene of the data reconciliation. The account checking content comprises the following steps: sequence number, account checking time, access data type, access data volume, output data volume and butt joint result.
Referring to fig. 3, in some embodiments, there is provided a multi-algorithm integration apparatus applied to the above method, including:
the registration module 101 is configured to collect registration information of a plurality of service algorithms and perform registration;
the interface integration module 102 is used for integrating interfaces of a plurality of registered service algorithms;
a database calling module 103, configured to call databases of multiple registered service algorithms, and receive data of each database;
a database feature extraction module 104, configured to perform feature extraction on data of each database;
the database integration module 105 is used for performing association, comparison and identification processing on the extracted features and performing classified storage according to feature types;
a receiving module 106, configured to receive a service task request;
a determining module 107, configured to determine at least two optimal service algorithms from the registered multiple service algorithms according to the service task request;
and the result integration module 108 is configured to execute the service task through the at least two optimal service algorithms and integrate an execution result of executing the service task.
Specifically, the interface integration module 102 is further configured to perform unified assignment on the interface parameters and the return result formats of the service algorithms to form a universal interface.
In some embodiments, the determining module 107 is further configured to parse the service task request to obtain a setting parameter related to the service task; calling at least three service algorithms of different types, inputting the setting parameters into the service algorithms of different types, and obtaining output result parameters; and verifying the output result parameters, and determining at least two optimal service algorithms according to the verification result.
In some embodiments, the determining module 107 is further configured to respectively establish N data sets according to the N output result parameters, where N is an integer greater than or equal to 3; in each round of verification, one data set is selected from the data sets to serve as verification data, the other data sets serve as training data, the training data are input into a service algorithm to be trained, then the verification data are input to be verified, the mean square error is calculated, and the mean square error average value of the service algorithm is calculated after N rounds of verification; performing the verification on each service algorithm to obtain a mean square error average value of each service algorithm, and sequencing the mean square error average values in an ascending order according to the mean square error average values of the service algorithms; and selecting a preset number of service algorithms which are ranked in the front to determine as the optimal service algorithm, wherein the preset number is more than or equal to two.
In some embodiments, the result integration module 108 is further configured to perform weighted average or simple average on the execution results to obtain integration results, and perform classification voting on the execution results to obtain integration results.
Referring to fig. 4, in some embodiments, an electronic device is provided, which includes a processor 1 and a storage 2, where the storage 2 stores a plurality of instructions, and the processor 1 is configured to read the plurality of instructions and execute the method described above.
The multi-algorithm integration method and the multi-algorithm integration device provided by the embodiment integrate the interfaces of a plurality of service algorithms and the related databases to form external interfaces with uniform formats and databases which are more convenient to call and compare, improve the running speed of the plurality of algorithms, and thus improve the execution efficiency of service tasks; the optimal algorithms are matched from the multiple algorithms, the obtained results have better generalization capability and are more stable and comprehensive, and the error of algorithm execution is minimized, so that the accuracy of task execution is improved; the algorithm results are integrated through a voting method and an averaging method, and then training is performed through a parallel or serial mode, so that a service task execution mode with higher accuracy is obtained, and the performance of the multi-algorithm model is further optimized.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (8)

1. A method for multi-algorithm integration, comprising:
collecting registration information of a plurality of service algorithms and registering;
integrating interfaces of a plurality of registered service algorithms;
calling a plurality of registered databases of service algorithms and receiving data of each database;
extracting the characteristics of the data of each database;
performing association, comparison and identification processing on the extracted features, and performing classified storage according to feature types;
receiving a service task request;
determining at least two optimal service algorithms from the registered service algorithms according to the service task request;
executing the service tasks through the at least two optimal service algorithms, and integrating the execution results of the executed service tasks;
determining at least two optimal service algorithms from the registered plurality of service algorithms according to the service task request, comprising:
analyzing the service task request to obtain a setting parameter related to the service task;
calling at least three service algorithms of different types, inputting the setting parameters into the service algorithms of different types, and obtaining output result parameters;
verifying the output result parameters, and determining at least two optimal service algorithms according to verification results;
verifying the output result parameters, and determining at least two optimal service algorithms according to the verification result, wherein the optimal service algorithms comprise:
respectively establishing N data sets according to the N output result parameters, wherein N is an integer greater than or equal to 3;
in each round of verification, one data set is selected from the data sets to serve as verification data, the other data sets serve as training data, the training data are input into a service algorithm to be trained, then the verification data are input to be verified, the mean square error is calculated, and the mean square error average value of the service algorithm is calculated after N rounds of verification;
performing the verification on each service algorithm to obtain a mean square error average value of each service algorithm, and sequencing the mean square error average values in an ascending order according to the mean square error average values of the service algorithms;
and selecting a preset number of service algorithms which are ranked in the front to determine as the optimal service algorithm, wherein the preset number is more than or equal to two.
2. The method of claim 1, wherein the interface integration of the registered plurality of service algorithms comprises:
and carrying out unified assignment on the interface parameters and the return result formats of all the service algorithms to form a universal interface.
3. The method of claim 1, wherein the types of service algorithms include classification algorithms, regression algorithms, and clustering algorithms.
4. The method of claim 1, wherein the mean square error is calculated by the following equation:
Figure DEST_PATH_IMAGE001
wherein E is i The mean square error obtained in the ith round of verification, M is the total number of the result data and the verification data output by training, r is the grouping number of the result data and the verification data output by training, M-r is the degree of freedom, n i Is the ith sample, s i 2 The sample variance generated for each group;
the mean square error average is calculated according to the following formula:
Figure 904949DEST_PATH_IMAGE002
wherein E is mean square error, N is the number of verifications, E i The mean square error obtained in the verification of the ith round.
5. The method of claim 1, wherein integrating results of executing service tasks comprises:
and carrying out weighted average or simple average on the execution result to obtain an integrated result.
6. The method of claim 1, wherein integrating results of executing service tasks comprises:
and carrying out classification voting on the execution result to obtain an integration result.
7. A multi-algorithm integration apparatus applied to the method according to any one of claims 1 to 6, comprising:
the registration module is used for acquiring registration information of a plurality of service algorithms and registering;
the interface integration module is used for integrating the interfaces of the registered service algorithms;
the database calling module is used for calling the registered databases of the plurality of service algorithms and receiving the data of each database;
the database feature extraction module is used for extracting features of data of each database;
the database integration module is used for performing association, comparison and identification processing on the extracted features and performing classified storage according to feature types;
the receiving module is used for receiving a service task request;
the determining module is used for determining at least two optimal service algorithms from the registered service algorithms according to the service task request;
and the result integration module is used for executing the service tasks through the at least two optimal service algorithms and integrating the execution results of the executed service tasks.
8. An electronic device comprising a processor and a storage device, the storage device storing a plurality of instructions, the processor being configured to read the plurality of instructions and to perform the method according to any one of claims 1-6.
CN202210519444.0A 2022-05-13 2022-05-13 Multi-algorithm integration method and device Active CN114625901B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210519444.0A CN114625901B (en) 2022-05-13 2022-05-13 Multi-algorithm integration method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210519444.0A CN114625901B (en) 2022-05-13 2022-05-13 Multi-algorithm integration method and device

Publications (2)

Publication Number Publication Date
CN114625901A CN114625901A (en) 2022-06-14
CN114625901B true CN114625901B (en) 2022-08-05

Family

ID=81907170

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210519444.0A Active CN114625901B (en) 2022-05-13 2022-05-13 Multi-algorithm integration method and device

Country Status (1)

Country Link
CN (1) CN114625901B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240177114A1 (en) * 2022-11-25 2024-05-30 Samsung Sds Co., Ltd. System for business process automation and method thereof
CN116415206B (en) * 2023-06-06 2023-08-22 中国移动紫金(江苏)创新研究院有限公司 Operator multiple data fusion method, system, electronic equipment and computer storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015140592A1 (en) * 2014-03-20 2015-09-24 Tata Consultancy Services Limited Repository and recommendation system for computer programs
CN108280091A (en) * 2017-01-06 2018-07-13 阿里巴巴集团控股有限公司 A kind of task requests execution method and apparatus
CN113641482A (en) * 2021-08-31 2021-11-12 联通(广东)产业互联网有限公司 AI algorithm off-line scheduling method, system, computer equipment and storage medium
CN113760513A (en) * 2021-09-16 2021-12-07 康键信息技术(深圳)有限公司 Distributed task scheduling method, device, equipment and medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241915B (en) * 2019-12-24 2024-02-09 北京中盾安全技术开发公司 Multi-analysis algorithm fusion application service platform method based on micro-service

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015140592A1 (en) * 2014-03-20 2015-09-24 Tata Consultancy Services Limited Repository and recommendation system for computer programs
CN108280091A (en) * 2017-01-06 2018-07-13 阿里巴巴集团控股有限公司 A kind of task requests execution method and apparatus
CN113641482A (en) * 2021-08-31 2021-11-12 联通(广东)产业互联网有限公司 AI algorithm off-line scheduling method, system, computer equipment and storage medium
CN113760513A (en) * 2021-09-16 2021-12-07 康键信息技术(深圳)有限公司 Distributed task scheduling method, device, equipment and medium

Also Published As

Publication number Publication date
CN114625901A (en) 2022-06-14

Similar Documents

Publication Publication Date Title
EP3985578A1 (en) Method and system for automatically training machine learning model
CN114625901B (en) Multi-algorithm integration method and device
CN108280795A (en) The screening technique of highway green channel exception vehicle based on dynamic data base
CN109902681B (en) User group relation determining method, device, equipment and storage medium
CN116049454A (en) Intelligent searching method and system based on multi-source heterogeneous data
CN114266492A (en) Enterprise financing fund matching method based on data mining
CN115062675A (en) Full-spectrum pollution tracing method based on neural network and cloud system
CN113486983A (en) Big data office information analysis method and system for anti-fraud processing
CN115719283A (en) Intelligent accounting management system
CN111897859B (en) Big data intelligent report platform for enterprise online education
CN116452212B (en) Intelligent customer service commodity knowledge base information management method and system
CN116484109B (en) Customer portrait analysis system and method based on artificial intelligence
CN110309737A (en) A kind of information processing method applied to cigarette sales counter, apparatus and system
CN115062725A (en) Hotel income abnormity analysis method and system
CN115309705A (en) Data integration classification system and method for automatically identifying basic data elements of urban information model platform
CN113128452A (en) Greening satisfaction acquisition method and system based on image recognition
CN113516229A (en) Credible user optimization selection method facing crowd sensing system
CN116562785B (en) Auditing and welcome system
CN113393216B (en) Laboratory digital system
CN116993307B (en) Collaborative office method and system with artificial intelligence learning capability
CN116049700B (en) Multi-mode-based operation and inspection team portrait generation method and device
CN112712177A (en) Knowledge engineering method and device based on cooperative processing
CN116071558A (en) Processing method and device and electronic equipment
CN117556256A (en) Private domain service label screening system and method based on big data
CN114218368A (en) Complaint information real-time collection platform based on cloud platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant