CN117853845A

CN117853845A - Machine vision model training method and system

Info

Publication number: CN117853845A
Application number: CN202410052299.9A
Authority: CN
Inventors: 吕慧奇; 王璨
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2024-01-12
Filing date: 2024-01-12
Publication date: 2024-04-09

Abstract

The disclosure provides a machine vision model training method and system, relates to the technical field of image processing, and particularly relates to the technical field of computer vision. The specific implementation scheme is as follows: acquiring a target prompt word for describing a target scene and shooting parameters of target image acquisition equipment; intelligent drawing is carried out according to the target prompt word, and a first sample image is obtained; synthesizing the first sample image as a video frame to obtain a sample video; circularly pushing the sample video in a video stream mode, and acquiring a video stream address of the video stream; and sending the video stream address to a model training end so that the model training end trains a machine vision model applicable to the target scene based on the video stream. The training efficiency of the machine vision model can be effectively improved.

Description

Machine vision model training method and system

Technical Field

The disclosure relates to the technical field of image processing, in particular to the technical field of computer vision, and particularly relates to a machine vision model training method and system.

Background

The image may be identified using the machine vision model to obtain corresponding information. For example, by identifying an image of a building to determine if the building is in fire. In order for the machine vision model to accurately identify whether a fire has occurred, the machine vision model needs to be trained in advance using sample data.

Disclosure of Invention

The disclosure provides a machine vision model training method and system.

According to an aspect of the present disclosure, there is provided a machine vision model training method, including:

acquiring a target prompt word for describing a target scene and shooting parameters of target image acquisition equipment;

intelligent drawing is carried out according to the target prompt word, and a first sample image is obtained;

synthesizing the first sample image as a video frame to obtain a sample video;

circularly pushing the sample video in a video stream mode, and acquiring a video stream address of the video stream;

and sending the video stream address to a model training end so that the model training end trains a machine vision model applicable to the target scene based on the video stream.

In a possible embodiment, the acquiring the target prompt word for describing the target scene and the shooting parameters of the target image capturing device includes:

displaying a guide interface comprising alternative prompt words corresponding to a plurality of scenes and alternative prompt words corresponding to shooting parameters of a plurality of image acquisition devices, wherein the alternative prompt words are used for describing features and/or features not possessed by the corresponding scenes or the shooting parameters of the image acquisition devices;

And responding to the input operation aiming at the guide interface, and identifying the prompt word indicated by the input operation as a target prompt word.

In a possible embodiment, the circularly pushing the sample video in the form of a video stream and acquiring a video stream address of the video stream includes:

displaying the communication address of each pre-configured push server;

in response to an address selection operation for a communication address, identifying the communication address selected by the address selection operation as a target communication address;

sending the sample video to a push server according to the target communication address so as to control the push server to circularly push the sample video in a video stream mode;

and acquiring a video stream address of the video stream fed back by the push server.

In one possible embodiment, the method further comprises:

according to the communication address of each push server which is pre-configured, the load of each push server is respectively obtained from each push server, wherein the load comprises one or more of the following loads: memory load, disk load, processor load;

the displaying the communication address of each pre-configured push server comprises the following steps:

And correspondingly displaying the communication address and the load of each push server.

In one possible embodiment, the method further comprises:

determining a plug-flow server with load larger than a preset load threshold as an abnormal plug-flow server;

and sending an alarm message to a management terminal preset for the abnormal push server.

In one possible embodiment, the method further comprises:

responding to video stream management operation, displaying thumbnail information of each video stream being pushed, wherein the thumbnail information is used for describing a scene presented by the video stream;

and responding to the video stream selection operation aiming at the thumbnail information, identifying the video stream selected by the video stream selection operation, and sending the video stream address of the selected video stream to a model training end so that the model training end trains to obtain a machine vision model based on the selected video stream.

In a possible embodiment, the synthesizing the first sample image for a video frame to obtain a sample video includes:

displaying a video synthesis interface;

responding to an image uploading operation aiming at the video synthesis interface, and acquiring a second sample image uploaded by the image uploading operation;

and synthesizing the first sample image and the second sample image as video frames to obtain a sample video.

In one possible embodiment, the method further comprises:

in response to a format configuration operation for the video composition interface, identifying configuration information indicated by the format configuration operation, wherein the configuration information is used to constrain one or more of the following parameters of a video: the duration of each sample image in the video, the resolution of the video, the code rate of the video, the coding format of the video and the frame rate of the video;

the step of synthesizing the first sample image and the second sample image as video frames to obtain a sample video comprises the following steps:

and synthesizing the first sample image and the second sample image as video frames to obtain the sample video conforming to the constraint of the configuration information.

responding to the operation of converting the picture into the video, and synthesizing the first sample image as a video frame to obtain a sample video;

the method further comprises the steps of:

and responding to the video uploading operation, and acquiring the video uploaded by the video uploading operation as a sample video.

Acquiring a plurality of groups of target prompt words used for describing shooting parameters of a target scene and target image acquisition equipment, wherein each group of target prompt words are different;

the intelligent drawing is carried out according to the target prompt word to obtain a first sample image, which comprises the following steps:

respectively carrying out intelligent drawing according to each group of target prompt words to obtain a plurality of first sample images;

the step of synthesizing the first sample image as a video frame to obtain a sample video includes:

and synthesizing the plurality of first sample images as video frames to obtain a sample video.

According to another aspect of the present disclosure, there is provided a machine vision model training system comprising:

the system comprises a main control server, a platform server and a push server;

the main control server is used for accessing the user terminal, receiving a target prompt word which is sent by the user terminal and used for describing shooting parameters of a target scene and target image acquisition equipment, and sending the target prompt word to the platform server;

the platform server is used for intelligently drawing according to the target prompt words to obtain a first sample image, and feeding the first sample image back to the user terminal through the main control server;

The main control server is also used for receiving the sample image sent by the user terminal and sending the sample image to the push server through the platform server;

the push server is used for synthesizing the sample images as video frames to obtain sample videos, circularly pushing the sample videos in a video stream mode, and sending video stream addresses of the video streams to the user terminal through the platform server and the main control server, so that the user terminal trains a machine vision model applicable to the target scene based on the video stream address control model training end.

In one possible embodiment, the system further comprises a proxy server;

the main control server and the platform server are positioned in a first network, and the push server is positioned in a second network;

and the proxy server is used for proxy interaction between the push server and the platform server.

In one possible embodiment, the system further comprises a cloud storage server;

the main control server is also used for receiving the sample video sent by the user terminal and sending the sample video to the platform server;

The platform server is further used for sending the sample video to the cloud storage server;

the cloud storage server is used for receiving and storing the sample video and sending an external link of the sample video to the platform server;

the platform server is further configured to send the external link to the proxy server;

the proxy server is further configured to obtain a sample video from the cloud storage platform based on the external link, and send the sample video to the push server.

In a possible embodiment, the master control server is further configured to receive a communication address sent by the user terminal, and send the communication address to the cloud storage server through the platform server for storage;

the proxy server is further configured to obtain the communication address from the cloud storage service, and access a network device corresponding to the communication address, and control the network device to enable a streaming media service, so that the network device is used as a new push server in the system.

According to another aspect of the present disclosure, there is provided a machine vision model training apparatus comprising:

the prompt word acquisition module is used for acquiring target prompt words for describing a target scene and shooting parameters of the target image acquisition equipment;

The image generation module is used for intelligently drawing according to the target prompt word to obtain a first sample image;

the video synthesis module is used for synthesizing the first sample image serving as a video frame to obtain a sample video;

the pushing module is used for circularly pushing the sample video in a video stream mode and acquiring a video stream address of the video stream;

and the export module is used for sending the video stream address to a model training end so that the model training end trains a machine vision model applicable to the target scene based on the video stream.

displaying the communication address of each pre-configured push server;

In one possible embodiment, the apparatus further comprises:

the load acquisition module is used for respectively acquiring the loads of the push servers from the push servers according to the communication addresses of the push servers which are configured in advance, wherein the loads comprise one or more of the following loads: memory load, disk load, processor load;

In one possible embodiment, the apparatus further comprises:

The abnormal plug-flow server determining module is used for determining a plug-flow server with load larger than a preset load threshold as an abnormal plug-flow server;

and the alarm message sending module is used for sending alarm messages to a management terminal preset for the abnormal push server.

In one possible embodiment, the apparatus further comprises:

the thumbnail information display module is used for responding to the video stream management operation and displaying thumbnail information of each video stream being pushed, wherein the thumbnail information is used for describing scenes presented by the video streams;

the video stream selection module is used for responding to the video stream selection operation aiming at the thumbnail information, identifying the video stream selected by the video stream selection operation, and sending the video stream address of the selected video stream to the model training end so that the model training end trains to obtain the machine vision model based on the selected video stream.

displaying a video synthesis interface;

In one possible embodiment, the apparatus further comprises:

a configuration information identifying module, configured to identify, in response to a format configuration operation for the video composition interface, configuration information indicated by the format configuration operation, wherein the configuration information is used to constrain one or more of the following parameters of a video: the duration of each sample image in the video, the resolution of the video, the code rate of the video, the coding format of the video and the frame rate of the video;

the method further comprises the steps of:

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the machine vision model training methods described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform any one of the machine vision model training methods described above.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a machine vision model training method of any of the above.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic flow chart of a machine vision model training method provided in the present application;

fig. 2a is a schematic diagram of a picture generation interface provided in the present application;

fig. 2b is a schematic diagram of a picture generation interface supporting forward and reverse hint words provided in the present application;

FIG. 2c is a schematic diagram of a picture generation interface provided with user guidance provided herein;

FIG. 3a is a schematic diagram of a sample image display interface provided herein;

FIG. 3b is a schematic diagram illustrating the operation of saving a sample image by a user through a sample image display interface provided in the present application;

FIG. 4a is a schematic diagram of a video composition interface provided herein;

Fig. 4b is a schematic diagram of an operation of selecting a sample image local to a user terminal by a user provided in the present application;

FIG. 4c is a schematic diagram of a video composition interface provided herein after a user has selected a sample image;

FIG. 4d is a schematic diagram of a video composition interface provided herein after a user has completed sample image selection;

FIG. 4e is a schematic diagram of a video composition interface providing format configuration functionality provided herein;

FIG. 4f is a schematic diagram of an interface provided herein after a user clicks the format configuration control of FIG. 4 e;

FIG. 5a is a schematic diagram of a plug interface provided herein;

FIG. 5b is another schematic illustration of the plug interface provided herein;

FIG. 5c is a schematic diagram of a push interface supporting user-specified push servers provided in the present application;

FIG. 5d is another schematic diagram of a push interface provided in the present application that supports user-specified push servers;

FIG. 6 is a schematic diagram of a model training interface provided herein;

FIG. 7a is a schematic structural diagram of a platform for implementing machine vision model training provided herein;

FIG. 7b is a schematic diagram of another configuration of a platform for implementing machine vision model training provided herein;

FIG. 8 is an interaction diagram of the platform shown in FIG. 7a in implementing a machine vision model training process;

FIG. 9a is an interaction diagram of the platform of FIG. 7b in implementing a machine vision model training process based on a synthesized video;

FIG. 9b is an interaction diagram of the platform of FIG. 7b in implementing a machine vision model training process based on video uploaded by a user;

FIG. 10a is a schematic diagram of a video stream management interface provided herein;

FIG. 10b is a schematic diagram of a video stream management interface provided in the present application that supports user stop pushing;

FIG. 10c is a schematic diagram of a video stream management interface supporting user recovery of a push stream of a video stream provided in the present application;

FIG. 11a is a schematic diagram of a server management interface provided herein;

FIG. 11b is another schematic diagram of a server management interface provided herein;

FIG. 11c is a schematic interface diagram after clicking on the add server control of FIG. 11 a;

fig. 12 is an interaction diagram of a user terminal in the process of obtaining information of a push server;

FIG. 13 is an interaction diagram in the process of adding a push server provided by the present application;

FIG. 14 is a schematic structural view of a machine vision model training device provided in the present application;

FIG. 15 is a block diagram of an electronic device for implementing a machine vision model training method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

For more clear explanation of the machine vision model training method provided in the present application, the application scenario of the machine vision training method provided in the present application will be exemplified below with application scenario of fire identification, it will be understood that fire identification is only one possible application scenario of the machine vision model training method provided in the present application, and in other possible embodiments, the machine vision model training method provided in the present application may be applied to other possible scenarios, which are not limited in any way by the following examples.

In order to timely process a fire disaster so as to reduce the loss caused by the fire disaster, in the related art, a user may deploy image acquisition devices in each area of a building, respectively, the image acquisition devices acquire images of each area of the building and send the images to a server, the server is pre-deployed with a machine vision model for identifying the fire disaster, and the server inputs the received images to the machine vision model to identify whether the fire disaster occurs or not and alarm when the occurrence of the fire disaster is identified.

However, the user is not a person skilled in the computer vision field and does not know the specific way of training the machine vision model, so the user often trains the model through a model training service provided by a third party, and these model training services generally require the user to input the address of the network camera, and obtain the video data shot by the network camera from the network camera according to the address as sample data used in training.

In addition, in order to enable the trained machine vision model to accurately recognize a fire, the user cannot input the address of any network camera, but needs to input the address of the network camera that can capture the scene of the fire. However, the probability of fire occurrence is low and the fire is unpredictable to a certain extent, so that the scene of the fire is difficult to be shot by the network camera, the address of the network camera which can shoot the scene of the fire is difficult to be acquired by a user, and the machine vision model which can accurately identify the fire is difficult to train and is low in training efficiency.

Based on this, the present application provides a machine vision model training method to effectively improve training efficiency of a machine vision model, referring to fig. 1, fig. 1 is a flow diagram of the machine vision model training method provided by the present application, which includes:

Step S101, acquiring a target prompt word for describing a target scene and shooting parameters of a target image acquisition device.

And step S102, intelligent drawing is carried out according to the target prompt word, and a first sample image is obtained.

Step S103, synthesizing the first sample image as a video frame to obtain a sample video.

Step S104, circularly pushing the sample video in the form of a video stream, and acquiring a video stream address of the video stream.

Step S105, the video stream address is sent to a model training end, so that the model training end trains a machine vision model applicable to a target scene based on the video stream.

According to the machine vision model training method, intelligent drawing is conducted according to the target scene described by the target prompt words and the shooting parameters of the target image acquisition equipment, the drawn images are synthesized into videos, and meanwhile the target prompt words are used for describing the shooting parameters of the target image acquisition equipment, so that the shooting process of a real network camera can be effectively simulated through reasonable setting of the target prompt words. In addition, by circularly pushing the sample video, the process of pushing the shot video data in the form of video stream by the network camera is simulated. Therefore, the process of shooting video and pushing video streams by the network camera is completely simulated, and the obtained video stream address can be regarded as the address of the network camera, so that model training can be performed by using the video stream address. Meanwhile, because the target scene is described by using the target prompt words and is not a truly occurring scene, even if the target scene is a rare scene, the video stream for training the machine vision can be obtained relatively easily, namely, the difficulty of acquiring training data of the rare scene can be reduced, and the training efficiency of the machine vision model for detecting or identifying the rare scene is effectively improved.

The foregoing flowchart shown in fig. 1 is a description of the machine vision model training method provided in the present application from the perspective of a device, and the machine vision model training method provided in the present application will be described below by taking a scenario in which a user needs to train a machine vision model as an example, and in conjunction with interaction between a user operation and the device, it will be understood that the user operation involved in the following description is only one possible example of a user operation, and in other possible embodiments, a user may train to obtain a machine vision model by using the machine vision model training method provided in the present application through other operation manners.

Firstly, a user accesses a platform through a browser or other application with a remote access function on a user terminal, and the platform is used for implementing the machine vision model training method provided by the application, so that the platform can be regarded as a machine vision model training system, and the mode of conveniently using the platform is hereinafter referred to as a mode. At this time, the user terminal presents a picture generation interface as shown in fig. 2a under the control of the platform.

The picture generation interface comprises an input window 211 and a confirmation control 212, a user can input a prompt word in the input window 211 and click the confirmation control 212 after finishing the input of the prompt word, so as to drive the user terminal to send the prompt word input by the user in the input window 211 to the platform, the platform receives the prompt word, and the prompt word is used as a target prompt word to conduct intelligent drawing, so that a first sample image is obtained.

As described in the foregoing step S101, the target cue words should be cue words for describing shooting parameters of the target scene and the target image capturing apparatus, wherein the target scene is a scene to which the trained machine vision model is applied, for example, a scene in which a fire occurs if the user needs to train to obtain a machine vision model for identifying a fire, and for example, a scene in which a bad weather and a vehicle is present if the user needs to train to obtain a machine vision model for identifying a vehicle on a road in bad weather, such as a scene in which a vehicle travels on a road in a rainy weather. It will be appreciated that the target scene is not a truly occurring scene, but rather a scene that the user has notionally occurred. And because the platform needs to occupy certain system resources based on intelligent drawing of target prompt words, and images of common scenes can be obtained relatively easily through shooting, the target scenes should be rare scenes as much as possible.

The photographing parameters of the target image capturing apparatus may be photographing parameters of an image capturing apparatus capable of photographing a target scene in a hypothesis, and for the example, photographing parameters of a camera disposed in a hypothesis where a fire occurs may be photographing parameters of a camera disposed in a hypothesis where a road is located, and for the example, photographing parameters of a camera disposed in a hypothesis where a vehicle is traveling on a road in a thunderstorm weather may be photographing parameters of a camera disposed in a hypothesis where a road is located.

The cue words may describe the shooting parameters of the target scene and the target image capturing device from various aspects. For the target scene, the prompting words can describe the time, weather and season of the target scene, for example, the scene that the vehicle runs on the road in the thunderstorm weather is taken as the target scene, the prompting words can comprise night, thunderstorm and summer, the night is used for describing that the target scene occurs in summer and the thunderstorm weather, the prompting words can describe the style of objects and backgrounds existing in the target scene, the prompting words can comprise cars, bicycles and roads, the prompting words can be used for describing that the cars and the bicycles exist in the target scene, and the background is the road.

For shooting parameters of the target image acquisition device, the prompt word can be described in terms of shooting angle, imaging quality, lens type, picture style and the like of the target image acquisition device, and by taking the target image acquisition device as a camera deployed on a road as an example, the prompt word can comprise overlooking, high definition, wide angle and reality because the camera deployed on the road is often a high definition wide angle lens and is often deployed on a high-altitude nodding road.

The foregoing description of the target scene and the feature of the target image capturing device is performed by the foregoing description of the target scene and the feature of the target image capturing device, so these terms are referred to as forward terms in the text, in some possible embodiments, the user may input only the forward terms, in other possible embodiments, the user may input not only the forward terms, but also terms (hereinafter referred to as reverse terms) for reversely describing the feature that the capturing parameters of the target scene and the target image capturing device do not have, and in still other possible embodiments, the user may input only the reverse terms, which will be described below as examples.

For convenience of description, the target scene is still taken as a scene of a vehicle running on a road in thunderstorm weather, and the target image acquisition device is taken as a camera deployed on the road as an example. For the reverse hint words used to describe the target scene, the time, weather, and season when the target scene is less likely (or unlikely) to occur may be described, and for example, the reverse hint words may include daytime, sunny, winter (less likely to generate thunderstorm in winter), and may describe the style of objects and backgrounds that are less likely (or unlikely) to exist in the target scene, and for example, the reverse hint words may include trains, trucks (some roads prohibit the trucks from traveling), landscapes, illusions, and spliced scenes.

As for the reverse hint word for describing the photographing parameters of the target image capturing apparatus, it may be described in terms of photographing angle, imaging quality, lens type, picture style, etc. which the target image capturing apparatus is less likely (or impossible) to have, and by way of example, it is impossible to photograph vehicles on a road from an ultra-close or self-photographing perspective in consideration of a camera disposed on the road, and thus the reverse hint word may include upward view, elevation angle, ultra-close view, self-photographing. Also considering that cameras deployed on roads tend to be small apertures and the captured images tend to be clear, watermark-free and realistic, reverse cue words may include large apertures, blurs, watermarks, low quality, sketches, nausea, distortion, cartoon, sirocco, etc.

In order to distinguish between forward and reverse hint words, as shown in fig. 2b, the input window 211 in the picture generation interface may include a forward input window 211a and a reverse input window 211b, the hint word input by the user in the forward input window 211a will be considered as a forward hint word, and the hint word input by the user in the reverse input window 211b will be considered as a reverse hint word.

It will be appreciated that although the shooting parameters of the target scene and the target image capturing device depend on the actual needs of the user, it may not be clear to the user how accurately the shooting parameters of the target scene and the target image capturing device are described, and thus in one possible embodiment, guidance may be provided to the user to assist the user in completing the entry of the prompt word.

For example, the platform provider may send, in advance, a prompt document promt to the user terminal, where prompt words for describing various scene prompt words and for describing shooting parameters of various image capturing devices are recorded, and these prompt words are collectively referred to as candidate prompt words hereinafter, where the candidate prompt words may include only forward prompt words, only reverse prompt words, and may also include both forward prompt words and reverse prompt words.

The user reads the promt through the user terminal, and can know how to accurately describe the shooting parameters of the target scene and the target image acquisition device according to the candidate prompt words recorded in the promt, and the user can select a prompt word suitable for describing the shooting parameters of the target scene and the target image acquisition device from the candidate prompt words, input the prompt word into the input window 211, and can list the prompt word capable of describing the shooting parameters of the target scene and the target image acquisition device according to the candidate prompt words and input the prompt word into the input window 211.

The candidate prompt words in the promat may be recorded in a form of a table, or may be recorded in other forms such as images, plain text, etc., and for convenience of description, only the form of the table is taken as an example, see table 1, and examples of possible candidate prompt words are shown in table 1:

table 1. Alternative hint word examples in promat

In some embodiments, the promt may be used only to guide the user to enter a prompt, in other embodiments, the promt may also be used to guide the user to complete subsequent operations of entering a prompt, as will be mentioned below, picture-to-video, video stream management, server management, etc. Since the promat is used to guide the user, the interface in which the promat is presented will be referred to herein as a guide interface, and an exemplary description will be given below of how the user enters the guide interface.

In the case that the platform provider sends the promat to the user terminal in advance, the user finds the promat through the file management system of the user terminal, inputs an opening operation for the promat, and responds to the opening operation, the user terminal opens the promat by using the document browsing application local to the user terminal, and at this time, the user terminal displays the guiding interface.

In another possible embodiment, as shown in fig. 2c, the foregoing picture generation interface further includes a help control 213, and the user may input an interactive operation for the help control 213, and in response to the interactive operation, the user terminal sends a guidance request to the platform, and in response to the guidance request, the platform sends the promt to the user terminal, and the user terminal displays the guidance interface according to the received promt.

The description has been made above of how the user inputs the prompt word through the picture generation interface to control the platform to perform intelligent drawing to obtain the first sample image, that is, the description has been made of the foregoing step S101 and step S102. It can be appreciated that the user may simply control the platform to intelligently draw one first sample image, or may control the platform to draw a plurality of first sample images. For the situation that the control platform draws a plurality of first sample images, a user can control the user terminal to return to the picture generation interface after intelligently drawing the first sample images according to the control platform, and draw a second first sample image again according to the description control platform, and so on until the user control platform draws a plurality of first sample images with enough quantity, and then the control platform synthesizes a sample video based on the drawn plurality of first sample images. Moreover, each time the user controls the platform to intelligently draw the first sample image, the input target prompt words should not be identical, so that the machine vision model can learn the characteristics of the target scene in the training process. For example, taking a target scene as an example of a fire disaster, the target prompt word input when the user control platform draws the first sample image may be: daytime, office, flame, smoke-free, the target prompt word input when the user control platform draws the second first sample image can be: daytime, bedroom, flame, smokeless fog, the target prompt word that the user control platform input when drawing the image of third first sample can be: the target prompt words input when the user control platform draws the fourth first sample image in daytime, office, flame and smoke can be: night, office, flame, no smoke.

The flow of how the platform synthesizes the first sample image to obtain the sample video will be described below.

In one possible embodiment, after the platform obtains the first sample image, the platform may automatically synthesize the first sample image into a sample video for the video frame. However, in some application scenarios, a user has certain requirements on a synthesized sample video, and the platform is difficult to know the requirements in advance, so that the sample video synthesized by the platform cannot meet the actual requirements of the user, based on this, in a possible embodiment, after the platform obtains a first sample image, the first sample image is sent to the user terminal, and the user terminal receives and stores the first sample image, so that the user can synthesize the sample video according to the own requirements based on the first sample image control platform local to the user terminal, and how the user controls the user terminal to store the first sample image and how the user controls the platform to synthesize the sample video will be described below:

in response to the first sample image sent by the platform, the user terminal displays a sample image display interface shown in fig. 3a, the sample image display interface displays a first sample image, the user can browse the first sample image to determine whether the first sample image meets the actual requirement, if yes, the user inputs a save operation for the sample image display interface to control the user terminal to save the first sample image, for example, as shown in fig. 3b, the user clicks the first sample image displayed in the sample image display interface through a right mouse button to control the user terminal to display a right button shortcut bar, and the user selects a "picture save as" option in the right button shortcut bar to control the user terminal to save the first sample image to a path designated by the user. If the sample image display interface is not satisfied, the user can select to close the sample image display interface, at this time, the user terminal will not save the first sample image, and after closing the sample image display interface, the user can enter the picture generation interface again so as to control the platform to regenerate a new first sample image.

It will be appreciated that fig. 3a, 3b are only schematic illustrations of sample image presentation interfaces, and thus the first sample image in fig. 3a, 3b is illustrated in the form of a simple drawing, in fact the first sample image should be a realistic style image, not a simple drawing. In other possible embodiments, the user terminal may also automatically save the first sample image in the preset storage path in response to the first sample image sent by the platform, but the saved first sample image may not meet the actual requirement of the user due to the fact that the user cannot preview the first sample image, so that the user cannot train to obtain an effective machine vision model, and the user can find the first sample image which does not meet the actual requirement in time through the sample image display interface shown in fig. 3a and 3b, so that the control platform regenerates the first sample image, thereby effectively reducing the possibility that the user cannot train to obtain an effective machine vision model in the follow-up.

After saving the first sample image, the user terminal may automatically or under the control of the user present a video composition interface, for example, the user terminal may present the video composition interface in response to a picture-to-video operation input by the user, and the video composition interface may include a picture selection control 411 and a composition control 412 as shown in fig. 4 a.

The user may input an interactive operation for the picture selection control 411 to select an image locally stored in the user terminal, and input an interactive operation for the composition control 412 to control the user terminal to send the image selected by the user to the platform, and the platform receives the images and composes the images into a sample video by using the images as video frames.

In this embodiment, the user-selected image should include at least a first sample image, and the user may select one or more first sample images, and the user may select one or more other sample images (hereinafter referred to as second sample images) in addition to the first sample image. The second sample image may be an image obtained by photographing or may be an image synthesized by any picture synthesis technique, and the second sample image should be an image that can help to train the machine vision model. For example, assuming that the user needs to train a machine vision model for recognizing a fire, it is understood that the process of training the machine vision model for recognizing a fire requires not only the use of an image of a fire occurring in a screen but also the use of an image of a fire not occurring in a screen, and thus the second sample image may be an image obtained by photographing a real scene of a fire, an image obtained by photographing a real building of a fire not occurring, or an image of a building of a fire not occurring by computer rendering.

In other possible embodiments, the platform may save the first sample image after obtaining the first sample image, in which case, the user may also select only the second sample image to control the user terminal to send the second sample image to the platform, and the platform receives the second sample image and reads the locally saved first sample image, and synthesizes the second sample image and the first sample image into a video frame to obtain a sample video.

Since the interactive operations input for the picture selection control 411 and the composition control 412 are used to control the user terminal to transmit the image to the platform, these interactive operations can be regarded as image upload operations. An exemplary explanation of how the user performs the image upload operation will be described below:

still referring to fig. 4a, the user clicks the picture selection control 411 therein, and in response to the user clicking, the user terminal displays a file management interface as shown in fig. 4b, through which the user can browse files stored in each storage path of the user terminal and select the first sample image or the second sample image therefrom.

For example, assuming that the user wants to select the first sample image, and the user terminal is controlled in advance to store the first sample image under the path "C: \picture\ai generation picture", the user may enter the path "C: \picture\ai generation picture" through the file management interface shown in fig. 4b, where the file management interface will show each file stored under the path "C: \picture\ai generation picture", the user searches for and selects the first sample image therein, and then clicks the confirm button, in response to the click of the user, the user terminal determines the image selected by the user as the image that the user has selected, closes the file management interface and re-shows the video synthesis interface, where the re-shows the video synthesis interface may be as shown in fig. 4a or as shown in fig. 4C, so that the user may conveniently browse the already selected image.

The user clicks the picture selection control 411 in the video composition interface again, and in response to the user clicking, the user terminal again displays the file management interface as shown in fig. 4b, and the user selects the first sample image or the second sample image again through the file management interface, and so on, until the user selects all the first sample images and the second sample images to be selected, at which time the video composition interface may be as shown in fig. 4 d.

Through the respective first and second sample images shown in fig. 4d, the user confirms that all the first and second sample images to be selected have been selected, and clicks the composition control 412, and in response to the user clicking, the user terminal transmits the first and second sample images selected by the user to the platform, which receives the images and composes the sample video with the images as video frames.

By adopting the embodiment, the user is allowed to additionally select the second sample image for synthesizing the sample video, and the information in the sample video is effectively enriched, so that the machine vision model learns more knowledge in the training process, and the trained machine vision model is more accurate.

The model training end may have certain requirements on the format of the video stream, such as requiring the resolution of the video stream to be a specified resolution, requiring the code stream of the video stream not to exceed a preset code rate threshold, and the like, limited by various conditions. Since the video stream is derived by pushing the sample video, the format of the video stream depends to some extent on the format of the sample video.

In order to enable the video stream to meet the requirements of the model training end, so as to realize training of the machine vision model, in a possible embodiment, the video synthesis interface further includes a format configuration control 413 as shown in fig. 4e, when the format of the synthesized sample video needs to be configured, the user clicks the format configuration control 413, in response to the user clicking, the user terminal displays the format configuration interface as shown in fig. 4f, the format configuration interface includes various indexes allowed to be configured by the user and input boxes corresponding to the various indexes, the user configures the corresponding indexes by inputting numerical values in the input boxes, and for example, 800 x 600 can be input in the input boxes on the right side of the index resolution, and through the format configuration interface shown in fig. 4f, the duration of the sample image, the resolution of the video, the code rate of the video, the coding format of the video and the frame rate of the video can be configured by the user.

The resolution, code rate, encoding format, frame rate among the above-mentioned indexes are conventional indexes of video, and thus are not explained here. Before explaining the duration, the sample video synthesized in the application is explained so as to understand the meaning of the duration.

In the application, the sample video is a video which sequentially displays each sample image in a form similar to a slide show, that is, the sample video is a video composed of a plurality of video clips, each video clip corresponds to one sample image, different video clips correspond to different sample images, and each video frame in the video clip is a sample image corresponding to the video clip. The duration of a video segment corresponding to a sample image is referred to herein as the duration of the sample image, and the duration of a sample image may be considered as the duration of a sample video showing the sample image.

In one possible embodiment, the duration of each sample image is the same, at which point the user need only configure one duration. In another possible embodiment, the duration of each sample image may be different, in which case the user needs to configure the duration separately for each sample image.

Taking the video composition interface shown in fig. 4e as an example, the sample image selected by the user in this example includes sample images 1-10, assuming that the user needs to configure the duration of sample image 1 to be 3s, the user may select sample image 1 in the video composition interface, click on format configuration control 413, and in response to the user clicking, the user terminal detects that the user has selected the sample image, and thus presents the format configuration interface shown in fig. 4e, in which the user configures the duration to be 3s.

It should be understood that fig. 4f and fig. 4e are only two possible format configuration interfaces provided in the present application, in other possible embodiments, the indicators shown in the format configuration interfaces may include only the duration of the sample image, the resolution of the video, the code rate of the video, the encoding format of the video, and some indicators in the frame rate of the video, and may further include other indicators, and the format configuration interfaces may also allow the user to input a numerical value in the form of a drop-down menu, a scroll bar, and the like, where fig. 4f and fig. 4e are not limited in any way.

For the case that the user performs configuration through the format configuration interface, if the user clicks the synthesis control 412, in response to the clicking of the user, the user terminal not only sends the sample image selected by the user to the platform, but also sends configuration information for representing the numerical value configured by the user for each index to the platform, so that the platform synthesizes the sample video meeting the constraint of the configuration information, and for example, if the configuration information represents that the user configures the resolution to 800×600, the platform synthesizes the sample video with the resolution of 800×600.

The embodiment is selected and used, so that a user is allowed to configure the format of the sample video according to actual demands, and a video stream generated based on the sample video can meet the requirements of various model training ends, and the applicability of the machine vision model training method is effectively improved.

The above has described how the platform synthesizes the sample video, i.e. the aforementioned step S103 has been described. The following will explain how the platform performs plug flow after synthesizing the sample video.

In one possible embodiment, the platform may automatically push the video stream after synthesizing the sample video, where the pushed video stream is obtained by pushing the sample video in a cyclic manner, and assuming that the total duration of the sample video is 20s, 0-20s of the video stream is the first sample video played, 20-40s is the second sample video played, and so on.

Considering that the user may need to control the platform to perform a push according to his own actual requirement, in another possible embodiment, the platform may store the sample video after synthesizing the sample video, and in this embodiment, the platform may store not only the synthesized sample video, but also other videos, for example, the user may obtain some videos that can be used for training the machine vision model through a mode other than video synthesis (such as shooting), so as to fully extend the training sample to improve the accuracy of the trained machine vision model, and the user may control the user terminal to upload the videos to the platform for storage through a video uploading operation, and since the videos are used as samples to participate in the machine vision model training, these videos are also referred to as sample videos herein.

When the user needs to control the platform to perform pushing, the user inputs pushing operation to the user terminal, the user terminal responds to the pushing operation, the user terminal sends a video acquisition request to the platform, the platform responds to the video acquisition request, relevant information of all sample videos stored by the platform is sent to the user terminal, the user terminal displays a pushing interface shown in fig. 5a according to the received relevant information, and the pushing interface displays all sample videos and relevant information of all sample videos and comprises a pushing control 511.

In the example shown in fig. 5a, the related information includes a video name, a video tag, and in other possible embodiments, the related information may also include information such as a video duration, a video format, a frame rate, a code rate, etc., which is not limited in any way by the example shown in fig. 5 a.

The user selects one or more sample videos in the push interface, clicks the push control 511, and in response to the click of the user, the user terminal identifies the sample videos selected by the user, and sends the identifiers of the sample videos to the platform, and the platform identifies the sample videos represented by the identifiers, and pushes the identified sample videos.

In yet another possible embodiment, after synthesizing the sample video, the platform sends the sample video to the user terminal for storage, when the user needs to control the platform to perform pushing, the user inputs a pushing operation to the user terminal, and in response to the pushing operation, the user terminal displays a pushing interface as shown in fig. 5b, where the pushing interface includes a pushing control 511 and a video selection control 512.

By interacting with the video selection control 512, the user is able to select one or more videos from among videos stored locally at the user terminal. The manner in which the user selects the video is similar to that in which the user selects the image above, except that the selected object is changed from the image to the video, so that reference may be made to the above description of how the user selects the image and the example shown in fig. 4b, which will not be repeated here.

After the user selects the videos, clicking the push control 511, and in response to the clicking by the user, the user terminal uploads the videos selected by the user to the platform, and the platform receives and pushes the videos.

It will be appreciated that the platform may need to serve the training of multiple machine vision models at the same time, and thus may need to push multiple video streams simultaneously, while pushing needs to occupy certain system resources, and the system resources of a single server are limited, so that the platform may include multiple push-capable servers (hereinafter referred to as push servers), and for the case where the platform includes multiple push servers, it is necessary to determine which push server is responsible for pushing. In one possible embodiment, the platform may determine that the push server with the highest priority is responsible for pushing from among all push servers with loads not exceeding a preset load threshold, or the platform may determine that the push server with the lowest load is responsible for pushing from among all push servers. Herein, loads include, but are not limited to, memory loads, disk loads, processor loads, and the like.

In a further possible embodiment, the server responsible for promoting the stream may also be specified by the user. Illustratively, the push interface further includes a push server selection control 513, and illustratively, as shown in fig. 5c, clicking the push server selection control 513 by the user expands a drop-down menu, where each push server is shown in the drop-down menu, and the user selects a push server that is desired to be responsible for pushing from the drop-down menu, in this example, in response to clicking the push control 511 by the user, the user terminal further identifies the push server selected by the user, and sends the identification of the push server to the platform, so that the platform identifies the push server represented by the identification, and performs pushing by identifying the obtained push server.

In the example shown in fig. 5c, the push server is shown in the form of a communication address of the push server, and in other possible embodiments, the push server may be shown in other forms, such as a number, a name, etc. of the push server. Compared with the serial number and the name, the communication address has uniqueness, so that a user can be helped to better distinguish different push servers.

By adopting the embodiment, the user is allowed to designate a specific plug-flow server to carry out plug-flow according to own requirements, so that the user can effectively manage the plug-flow server resources of the platform, and the resource utilization rate is improved. For example, assuming that there are a total of 3 push servers, respectively denoted as push servers 1-3, and none of the push servers 1-3 is currently pushing, but the user knows from the work plan of the colleague that the push servers 2, 3 will be subsequently used for pushing responsible for multiple video streams, the user may choose to push the push server 1 rather than avoid conflicts with the work plan of the colleague.

In addition, in order to facilitate the user to better select the push server responsible for push, in a possible embodiment, as shown in fig. 5d, the load of each push server may be correspondingly displayed in a pull-down menu, so as to help the user make a reasonable selection, thereby further improving the resource utilization rate.

The push architecture used in the above examples may be different according to the application scenario, such as RTMP (Real Time Message Protocol, real-time information transfer protocol), HLS (HTTP Live Streaming, hypertext transfer protocol based streaming media transfer protocol), and in one possible embodiment, fmpeg (audio/video processing technology) architecture is selected to improve the applicability of video streaming.

How the platform performs the plug flow has been described above, i.e. the foregoing step S104 has been described. The model training end is often platform independent, so after the platform performs the pushing, the model training end needs to be able to obtain the video stream address of the video stream, and how to enable the model training end to obtain the video stream address will be described below.

In a possible embodiment, the user may configure the communication address of the model training end in the platform in advance, and illustratively, the foregoing push interface may further include an address input window, where the user may input the address of the model training end, and in this example, in response to clicking the push control 511, the user terminal sends the communication address of the model training end to the platform, and the platform receives the communication address and sends the video stream address to the model training end according to the communication address after pushing.

In another possible embodiment, the platform sends the video stream address to the user terminal after pushing, the user terminal receives and displays the video stream address, the user inputs a copy operation for the video stream address, and controls the user terminal to remotely access the model training terminal to display a model training interface as shown in fig. 6, the user inputs a paste operation for inputting the video stream address in a window in the model training interface, clicks a training button, and in response to clicking by the user, the user terminal sends the video stream address input by the user in the window to the model training terminal.

As described above, the sample video may include not only the sample video synthesized by the platform but also the sample video obtained by the user through other means, and thus there are cases where there are multiple sample videos, and in this case the platform will push the multiple sample videos separately to form multiple video streams. The video stream addresses of all the video streams in the plurality of video streams may be sent to the model training terminal, or the video stream addresses of only part of the video streams may be sent to the model training terminal.

The complete flow of how the user performs the training of the machine vision model has been described so far, i.e. the foregoing steps S101-S105. The various steps performed by the platform are also described in the above description, but in practice the platform is often composed of a plurality of servers, and the platform is described above as a whole only, so the steps performed by the servers in the platform will be described below in conjunction with the structure of the platform.

Referring to fig. 7a, fig. 7a is a schematic structural diagram of a platform provided in the present application, including a main control server 710, a platform server 720, and a push server 730. The interactions between the user terminal and the servers may be seen in fig. 8, including:

s801, the user terminal sends a target prompt word to the main control server.

S802, the main control server sends target prompt words to the platform server.

S803, the platform server performs intelligent drawing according to the target prompt word to obtain a first sample image.

The platform server is deployed with an algorithm or model for realizing intelligent drawing, such as a stable diffusion model, and the platform server utilizes the deployed algorithm or model to realize intelligent drawing so as to obtain a first sample image.

S804, the platform server sends the first sample image to the master control server.

S805, the main control server transmits the first sample image to the user terminal.

As described above in relation to video composition, in other possible embodiments, the platform server may not send the first sample image to the user terminal through the master server, but rather store the first sample image.

S806, the user terminal transmits the sample image to the main control server.

The sample image may include only the first sample image, or may include both the first sample image and the second sample image.

S807, the main control server transmits the sample image to the push server.

S808, the plug flow server synthesizes the sample images serving as video frames to obtain sample videos.

S809, the push server circularly pushes the sample video in the form of a video stream.

And S810, the push server sends the video stream address of the video stream to the main control server.

S811, the master control server sends the video stream address to the user terminal.

For how the user terminal sends the video stream address to the model training end after obtaining the video stream address, refer to the above related description, which is not repeated herein. It should be understood that fig. 7a is only a schematic diagram of one possible architecture of the platform, in another possible embodiment, the host server and the push server are not directly connected, but indirectly communicate with the push server through the platform server, where the interaction flow between the user terminal and each server is similar to that of fig. 8, and the difference is that in fig. 8, the interaction between the host server and the push server is indirectly implemented through the platform server, for example, in step S807, the host server sends the sample image to the push server, which includes: the main control server sends the sample image to the platform server, and the platform server sends the sample image to the push flow server.

In yet another possible embodiment, the platform includes not only the three servers shown in fig. 7a described above, but also a proxy server 740 and a cloud storage server 750 as shown in fig. 7b, and in other possible embodiments, only one of the proxy server 740 and the cloud storage server 750 may be included.

The proxy server 740 is used to interact with the platform server 720 by the proxy push server 730. Illustratively, the push server 730 needs to send a video stream address to the platform server 720, and if the proxy server 740 exists, the push server 730 sends the video stream address to the proxy server 740, and the proxy server 740 sends the video stream address to the platform server 720.

It will be appreciated that, as described above, in some application scenarios, the platform needs multiple push servers 730, and the network devices within the network (hereinafter referred to as the first network) to which the main control server 710 and the platform server 720 belong are limited, so that the number of network devices that can be used to implement push is smaller, and therefore, it is required to use the push servers 730 located in other networks (hereinafter referred to as the second network), but the first network is usually an intranet, and the second network is usually an extranet, if the connection between the platform server 720 and the extranet network device is directly established, the intranet is easily attacked, so that the interaction between the platform server 720 and the push servers 730 is implemented through the proxy server 740, so that the extranet can only acquire the address of the proxy server 740 even by means of the extranet push servers 730, and thus only attack on the proxy server 740 can be performed, and the security of the platform server 720 is effectively protected.

The cloud storage server 750 is used to store various information that the platform needs to store, including but not limited to the first sample image, sample video, etc. stored by the platform as mentioned above.

The interaction between the user terminal and the servers will be exemplarily described with reference to the structure shown in fig. 7b, see fig. 9a, including:

s901, a user terminal sends a target prompt word to a main control server.

S902, the main control server sends target prompt words to the platform server.

S903, the platform server performs intelligent drawing according to the target prompt word to obtain a first sample image.

S904, the user terminal transmits the second sample image to the main control server.

S905, the main control server transmits the second sample image to the platform server.

S906, the platform server transmits the first sample image and the second sample image to the cloud storage server.

For convenience of description, the first sample image and the second sample image are collectively referred to as a sample image hereinafter.

S907, the cloud storage server stores each sample image.

S908, the cloud storage platform sends external links of each sample image to the platform server.

The external links are used to download resources from the cloud storage platform, e.g., the external links for each sample image are used to download sample images from the cloud storage platform.

S909, the user terminal transmits the configuration information to the main control server.

For configuration information, see the relevant description of how the user controls the platform to compose the video, which is not repeated here.

S910, the master control server sends configuration information to the platform server.

S911, the platform server transmits external links and configuration information of each sample image to the proxy server.

S912, the proxy server acquires the sample image from the cloud storage server according to the external link of the sample image.

And S913, the proxy server sends the sample images and the configuration information to the push server.

S914, the plug flow server synthesizes the sample images as video frames to obtain sample videos conforming to the constraint of the configuration information.

S915, the push server circularly pushes the sample video in the form of a video stream.

S916, the push server sends the video stream address to the proxy server.

S917, the proxy server sends the video stream address to the platform server.

S918, the platform server sends the video stream address to the master control server.

The master server sends the video stream address to the user terminal S919.

Fig. 9a shows interactions between the user terminal and the servers from picture generation to video composition to push. As described above, the platform in the present application may not only push the obtained sample video, but also push the sample video uploaded by the user through the user terminal, and only the interaction of the former is shown in fig. 9a, so the interaction of the latter will be described below, and in order to implement the pushing of the sample video uploaded by the user through the user terminal, the interaction flow further includes:

And S920, the user terminal sends the sample video to the master control server.

S921, the master server sends the sample video to the platform server.

S922, the platform server sends the sample video to the cloud storage server.

S923, the cloud storage server stores the sample video.

S924, the cloud storage server sends an external link of the sample video to the platform server.

S925, the platform server sends the external link of the sample video to the proxy server.

S926, the proxy server acquires the sample video from the cloud storage server according to the external link of the sample video.

And S927, the proxy server sends the sample video to the push server.

After the push server obtains the sample video, the user terminal can obtain the video stream address in the same manner as in the foregoing steps S915 to S919, thereby implementing the training of the machine vision model. By selecting the embodiment, a user can obtain a sample video through a video synthesis mode, and can also take the video local to the user terminal as the sample video, so that the number of samples is effectively enriched, and the accuracy of a machine vision model obtained through training is further improved. Meanwhile, the proxy server acquires the sample video from the cloud storage server and forwards the sample video to the plug-flow server, so that direct communication between the plug-flow server and the cloud storage platform is avoided, and the safety of the cloud storage server is effectively improved.

As in the previous examples of fig. 9a and 9b, the cloud storage server 750 will store the sample video and sample image sent by the platform server 720, and in other possible embodiments, the cloud storage server 750 may also be used to store other information sent by the platform server 720. For convenience of description, information that the platform server 720 transmits to the cloud storage server 750 to store is referred to herein as information to be stored.

In one possible embodiment, cloud storage server 750 may store this information to be stored in a random path. In another possible embodiment, to facilitate management of information stored in cloud storage server 750, cloud storage server 750 may randomly store such information in a designated path (hereinafter referred to as a target storage path). The target storage path may be preconfigured by a platform provider or a user, and illustratively, the user may control the user terminal to send a storage path adding instruction to the main control server 710, the main control server 710 sends the storage path adding instruction to the platform server 720, the platform server 720 identifies the storage path indicated by the storage path adding instruction as the target storage path, and when the subsequent platform server 720 sends the information to be stored to the cloud storage server 750 for storage, the cloud storage server 750 is controlled to store the information to be stored in the target storage path. The user may configure only one target storage path, may configure a plurality of target storage paths, and may also delete one or more configured target storage paths as desired. The manner of deleting the target storage path is the same as the manner of adding the target storage path, and will not be described here again.

In the flow shown in fig. 9b, step S925 may be performed automatically by the platform server upon receiving the external link of the sample video, and in another possible embodiment, the platform server does not perform step S925 automatically after receiving the external link of the sample video, but performs step S925 under the control of the user. Illustratively, when the user needs the platform to push, the user terminal sends a push instruction to the main control server, and in response to the push instruction, the main control server sends a push instruction to the platform server, and in response to the push instruction, the platform server executes step S925. The user may control the user terminal to send the push command to the main control server by clicking the push control in the push interface, or may control the user terminal to send the push command to the main control server by other methods, which will not be described herein.

The user terminal not only can send a push command to the main control server, but also can send a push stopping command to the main control server, and the main control server sends the push stopping command to the push server through the platform server and the proxy server so as to control the push server to stop push.

It can be understood that pushing the video stream will occupy a certain system resource, and if there is a video stream that can be used for training the machine vision model required by the user in the video stream being pushed by the platform, the user does not need to control the platform to push a new video stream, thereby effectively reducing the occupation of platform resources.

Referring to fig. 10a, fig. 10a is a schematic diagram of a video stream management interface provided in the present application, a user may input a video stream management operation to a user terminal to control the user terminal to display an interface shown in fig. 10a, where thumbnail information of each video stream being pushed is displayed in the interface shown in fig. 10a, where the thumbnail information is obtained by the user terminal from a platform, and the user terminal may obtain the thumbnail information from the platform in response to the video stream management operation, or may obtain the thumbnail information from the platform periodically.

The information included in the thumbnail information may be different according to application scenes, but the thumbnail information should be at least capable of describing a scene in which a video stream is presented, and for example, assuming that a picture of a fire scene is included in one video stream, the thumbnail information of the video stream should be capable of reflecting that the video stream presents the fire scene.

In the example shown in fig. 10a, the thumbnail information includes a tag, and the scene presented by the video stream is described by the tag, for example, the video stream includes a picture that a vehicle runs on a road at night in a thunderstorm day, and the tag of the video stream may include "night", "bad weather", "vehicle". Because the video stream is obtained by pushing the sample video, the scene presented by the video stream is consistent with the scene presented by the sample video, so that the label of the sample video can be determined as the label of the video stream, and the label of the sample video can be configured by a user when the control platform synthesizes the sample video or can be configured by the user when the sample video is uploaded to the platform. The thumbnail information may be used to describe other attributes of the video stream, including but not limited to code rate, resolution, format, etc. of the video stream, in addition to describing the scene in which the video stream is presented.

According to the thumbnail information displayed in the video stream management interface, a user can know whether a video stream capable of meeting the requirements exists, if so, the user can select the video stream meeting the requirements in the video stream management interface, the user terminal displays the video stream address of the video stream selected by the user in response to the video stream selected by the user, and the user controls the user terminal to send the video stream address to the model training end in a copying and pasting mode so as to train and obtain the machine vision model.

There may be a video stream that can meet the user's needs in the video stream being pushed, and there may also be a video stream that cannot meet the user's needs, for example, the user has used video stream a to train the machine vision model sufficiently, at which time video stream a cannot meet the user's needs. Maintaining the push of these video streams results in waste of system resources, and based on this, the present application also allows the user to stop pushing of a particular video stream through the video stream management interface control platform.

For example, as shown in fig. 10b, the video stream management interface further includes a stop push button after the thumbnail information of each video stream, and the user only needs to click the stop push button to control the platform to stop pushing the corresponding video stream, taking the example shown in fig. 10b as an example, if the user does not need the video stream 2 any more, the user terminal may click the stop push button after the thumbnail information of the video stream 2, and in response to the click of the user, the user terminal sends a stop push instruction for the video stream 2 to the platform to control the platform to stop pushing the video stream 2.

In addition, the video stream management interface may show not only the thumbnail information of the video stream being pushed, but also the thumbnail information of the video stream that the platform has pushed but has stopped pushing at present, for example, as shown in fig. 10c, if the user finds that the video stream that has stopped pushing has a video stream meeting the requirement of the user by browsing the thumbnail information, the user may click the push button after the thumbnail information of the video stream to control the platform to push the video stream again, for example, the user controls the platform to push the video stream 5 for training the machine vision model a, after completing the training of the machine vision model a, the user controls the platform to stop pushing the video stream 5, then the user needs to train the machine vision model B, and find that the video stream 5 can also be used for training the machine vision model B by browsing the thumbnail information, then the user clicks the push button after the thumbnail information of the video stream 5 to control the platform to push the video stream 5 again. In the example shown in fig. 10c, in order to facilitate the user to distinguish the video stream being pushed from the video stream having stopped pushing, the video stream status is added to the thumbnail information, and in other possible embodiments, the video stream management interface may also be distinguished by different other manners, such as different base colors, different fonts, and so on.

Referring back to fig. 7a and 7b again, although only one push server is shown in fig. 7a and 7b, as described above, a plurality of push servers are often required for the platform, but if the number of push servers configured for the platform is too large, server resources may be wasted, and if the number of push servers configured for the platform is too small, it may be difficult for each push server to effectively implement push. Therefore, the number of push servers needs to be reasonably configured according to the actual demands of users. It is often difficult for a platform provider to predict the number of video streams that a user needs to push at the same time, so it is difficult for the platform provider to configure a reasonable number of push servers.

Based on this, in one possible embodiment, the platform is provided with a server management function by which a user can add or delete a push server to the platform according to actual needs (hereinafter, this process is referred to as server management). The server management functions will be described below separately from user operations and interactions between the servers of the platform, respectively.

The user controls the user terminal to present a server management interface as shown in fig. 11a by inputting a server management operation to the user terminal, the server management interface showing related information of a push server that has been added to the platform, and includes a server addition control 1101.

The related information of the push server includes a communication address of the push server, and may further include a load of the push server, a person who adds the push server, the number of video streams the push server is pushing, and the like.

In the example shown in fig. 11a, the relevant information of each push server is shown at the same time, in another possible embodiment, as shown in fig. 11b, only the relevant information of the push server selected by the user is shown.

If the user needs to add a new push server to the platform, the server adding control 1101 may be clicked, and in response to clicking by the user, the user terminal displays a server adding interface as shown in fig. 11c, which includes an address input window, in which the user inputs a communication address of the push server to be added, and clicks a confirm button. And responding to the click of the user, and sending the communication address input by the user to the platform by the user terminal so that the platform accesses a new push server according to the communication address.

If the user needs to delete a certain push server for the platform, for the example shown in fig. 11a, the user may click a delete button after the push server, and in response to the click of the user, the user terminal sends a delete instruction for the push server to the platform, so as to control the platform to remove the push server. For the example shown in fig. 11b, the user may select the push server to be deleted, click the delete button, and respond to the click of the user, the user terminal sends a delete instruction for the push server to the platform, so as to control the platform to remove the push server.

The user operation has been described above, and since the platform is regarded as a whole, the interaction between the user terminal and each server of the platform is not described, so that the embodiment of the present application will be described more clearly, and the interaction will be described in the following.

For convenience of description, the interaction flow is described below by taking the platform structure shown in fig. 7b as an example, and the same situation as that of the platform structure shown in fig. 7a is applicable, and will not be repeated.

Referring to fig. 12, fig. 12 shows a flow of how a user terminal obtains relevant information of each push server to display a server management interface, including:

s1201, the user terminal sends a server information acquisition request to the main control server.

The server information acquisition request may be used to request the relevant information of all push servers, or may be used to request the relevant information of only a part of push servers. And the time for the user terminal to send the server information acquisition request can be different according to different application scenes.

Illustratively, in the foregoing example shown in fig. 11a, the server information acquisition request is sent by the user terminal in response to a server management operation entered by the user, and is used to request the relevant information of all push servers. In the example shown in fig. 11b, the server information acquisition request is sent in response to the user selecting the push server, and is used to request the related information of the push server selected by the user.

S1202, in response to the server information acquisition request, the master server transmits the server information acquisition request to the platform server.

S1203, in response to the server information acquisition request, the platform server transmits the relevant information requested by the server information acquisition request to the main control server.

The platform server may acquire the relevant information of the push servers through the proxy server in response to the server information acquisition request, and may periodically acquire the relevant information of each push server from the proxy server and store the relevant information in the local area of the platform server.

And S1204, the main control server sends the related information to the user terminal.

The user terminal receives the related information and shows the related information in a server management interface.

Referring to fig. 13 again, fig. 13 shows a flow of adding a new push server, including:

s1301, the user terminal sends the communication address to the master server.

S1302, the master control server sends the communication address to the platform server.

And S1303, the platform server sends the communication address to the proxy server.

And 1304, accessing the new push server by the proxy server according to the communication address.

S1305, the proxy server interacts with the push server to enable streaming services of the push server.

After enabling the streaming media service, the new push server will open the listening port, so that the information such as the sample video, the external link, etc. can be obtained from the proxy server later. For what information the push server needs to obtain from the proxy server, refer to the examples shown in fig. 9a and fig. 9b, and are not described in detail herein. And the process of deleting the push server is similar to that of adding the push server, which is similar to that of the example shown in fig. 13, and thus will not be described again here.

While the machine vision model training function, video stream management function, and server management function provided by the platform have been described above, it is to be understood that these functions are not all functions of the platform, and that the platform provided by the present application may also be provided with other functions, and the above examples are not limited in any way.

The platform may also determine, as an abnormal push server, a push server having a load greater than a preset load threshold. And sending an alarm message to a management terminal preset for the abnormal push server so as to inform a manager of the abnormal push server to take countermeasures in time, ensure normal push of the push server and avoid influencing training of a machine vision model.

The platform can alarm by sending a short message to the mobile phone or making a call, and can alarm by sending an alarm message to the personal computer when the management terminal is the personal computer so as to control the personal computer to pop up an alarm window. The sending of the short message, the calling and the popup window are all examples of the alarm modes, and in other possible embodiments, the platform can also alarm in other modes, and the above examples do not limit the above.

The interfaces mentioned above may be windows or web pages, and the styles thereof may be different according to application scenarios, and the illustrations of the various interfaces herein are merely examples for describing the present application more clearly, and do not limit the present application in any way.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

It should be noted that the aforementioned second sample images obtained by photographing are all obtained by photographing with the user informed and allowed, and photographing for a specific person. For example, the second sample image obtained by photographing the vehicle is obtained not by photographing the vehicle of a specific person but by photographing the vehicle of an unspecified person.

And for the scenes of processing such as collection, storage, use, processing, transmission, provision and disclosure of personal information of a user, the machine vision model trained by the method is used for improving the safety of the public, and the machine vision model for identifying fire disaster can accurately identify the occurrence of the fire disaster, so that related personnel can be informed of processing in time, and damage of the fire disaster to the public is reduced.

It should be noted that, the two-dimensional face image in this embodiment is derived from the public data set.

Corresponding to the foregoing machine vision model training method, the present application further provides a machine vision model training device, referring to fig. 14, fig. 14 is a schematic structural diagram of the machine vision model training device provided by the present application, including:

a prompt word acquisition module 1401, configured to acquire a target prompt word for describing a target scene and shooting parameters of a target image acquisition device;

the image generating module 1402 is configured to perform intelligent drawing according to the target prompt word to obtain a first sample image;

a video synthesis module 1403, configured to synthesize the first sample image into a sample video by using the first sample image as a video frame;

A pushing module 1404, configured to circularly push the sample video in a video stream form, and obtain a video stream address of the video stream;

and the deriving module 1405 is configured to send the video stream address to a model training end, so that the model training end trains a machine vision model applicable to the target scene based on the video stream.

displaying the communication address of each pre-configured push server;

In one possible embodiment, the apparatus further comprises:

displaying a video synthesis interface;

In one possible embodiment, the apparatus further comprises:

the method further comprises the steps of:

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 15 illustrates a schematic block diagram of an example electronic device 1500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 15, the apparatus 1500 includes a computing unit 1501, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1502 or a computer program loaded from a storage unit 1508 into a Random Access Memory (RAM) 1503. In the RAM1503, various programs and data required for the operation of the device 1500 may also be stored. The computing unit 1501, the ROM1502, and the RAM1503 are connected to each other through a bus 1504. An input/output (I/O) interface 1505 is also connected to bus 1504.

Various components in device 1500 are connected to I/O interface 1505, including: an input unit 1506 such as a keyboard, mouse, etc.; an output unit 1507 such as various types of displays, speakers, and the like; a storage unit 1508 such as a magnetic disk, an optical disk, or the like; and a communication unit 1509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1509 allows the device 1500 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The computing unit 1501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The computing unit 1501 performs the various methods and processes described above, such as a machine vision model training method. For example, in some embodiments, the machine vision model training method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 1500 via the ROM1502 and/or the communication unit 1509. When the computer program is loaded into the RAM1503 and executed by the computing unit 1501, one or more steps of the machine vision model training method described above may be performed. Alternatively, in other embodiments, the computing unit 1501 may be configured to perform the machine vision model training method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A machine vision model training method, comprising:

synthesizing the first sample image as a video frame to obtain a sample video;

2. The method of claim 1, wherein the obtaining the target cue word for describing the target scene and the shooting parameters of the target image capturing device comprises:

3. The method of claim 1, wherein the cyclically pushing the sample video in the form of a video stream and obtaining a video stream address of the video stream comprises:

displaying the communication address of each pre-configured push server;

4. A method according to claim 3, further comprising:

5. The method of claim 4, further comprising:

6. The method of claim 1, further comprising:

7. The method of claim 1, wherein the synthesizing the first sample image for the video frame to obtain the sample video comprises:

displaying a video synthesis interface;

8. The method of claim 7, further comprising:

9. The method of claim 1, wherein the synthesizing the first sample image for the video frame to obtain the sample video comprises:

the method further comprises the steps of:

10. The method of claim 1, wherein the obtaining the target cue word for describing the target scene and the shooting parameters of the target image capturing device comprises:

11. A machine vision model training system, comprising:

12. The system of claim 11, further comprising a proxy server;

13. The system of claim 12, further comprising a cloud storage server;

14. The system of claim 13, wherein the master server is further configured to receive a communication address sent by the user terminal, and send the communication address to the cloud storage server for storage through the platform server;

15. A machine vision model training apparatus, comprising:

16. The apparatus of claim 15, wherein the obtaining the target cue word for describing the target scene and the shooting parameters of the target image capturing device comprises:

17. The apparatus of claim 15, wherein the cyclically pushing the sample video in the form of a video stream and obtaining a video stream address of the video stream comprises:

displaying the communication address of each pre-configured push server;

18. The apparatus of claim 17, further comprising:

19. The apparatus of claim 18, further comprising:

20. The apparatus of claim 15, further comprising:

21. The apparatus of claim 15, wherein the synthesizing the first sample image for the video frame to obtain the sample video comprises:

displaying a video synthesis interface;

22. The apparatus of claim 21, further comprising:

23. The apparatus of claim 15, wherein the synthesizing the first sample image for the video frame to obtain the sample video comprises:

the method further comprises the steps of:

24. The apparatus of claim 15, wherein the obtaining the target cue word for describing the target scene and the shooting parameters of the target image capturing device comprises:

25. An electronic device, comprising:

At least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

26. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-10.

27. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-10.