CN111950419A

CN111950419A - Image information prediction method, image information prediction device, computer equipment and storage medium

Info

Publication number: CN111950419A
Application number: CN202010766986.9A
Authority: CN
Inventors: 魏超时; 陈博; 易军
Original assignee: EASTERN CHINA AIR TRAFFIC MANAGEMENT BUREAU CAAC
Current assignee: EASTERN CHINA AIR TRAFFIC MANAGEMENT BUREAU CAAC
Priority date: 2020-08-03
Filing date: 2020-08-03
Publication date: 2020-11-17

Abstract

The application relates to an image information prediction method, an image information prediction device, a computer device and a storage medium. The method comprises the following steps: extracting historical images of a plurality of frames from a historical video stream; extracting historical key points in each historical image; acquiring historical key point positions corresponding to the historical key points; predicting the predicted key point positions of future frame images according to the historical key point positions, wherein the future frame images are the next frame images of the historical images in the video stream; calculating a predicted position thermodynamic diagram corresponding to each predicted key point position; performing model training according to the thermodynamic diagrams of the predicted positions and the historical images to obtain a prediction model; and extracting a target historical image from the target video stream, and inputting the target historical image into a prediction model so as to obtain a target future frame image containing the key point information according to the prediction model. By adopting the method, the accuracy of image information prediction can be improved.

Description

Image information prediction method, image information prediction device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for predicting image information, a computer device, and a storage medium.

Background

Video prediction can be applied to many scenes, such as the prediction of moving tracks of pedestrians, automobiles and the like in automatic driving, the prediction of future fields of meteorological elements (temperature, humidity, wind speed and the like), the prediction of the proximity of radar echoes and the like.

In the traditional technology, future key points in future data are predicted directly according to historical key points in historical data, and due to the fact that information contained in the key points is limited, less information can be used in the prediction process, and the prediction accuracy is low.

Disclosure of Invention

In view of the above, it is necessary to provide an image information prediction method, apparatus, computer device, and storage medium capable of predicting accuracy in view of the above technical problems.

An image information prediction method, the method comprising:

extracting historical images of a plurality of frames from a historical video stream;

extracting historical key points in each historical image;

acquiring historical key point positions corresponding to the historical key points;

predicting the predicted key point positions of future frame images according to the historical key point positions, wherein the future frame images are the next frame images of the historical images in the video stream;

calculating a predicted position thermodynamic diagram corresponding to each predicted key point position;

performing model training according to the thermodynamic diagrams of the predicted positions and the historical images to obtain a prediction model;

and extracting a target historical image from the target video stream, and inputting the target historical image into a prediction model so as to obtain a target future frame image containing image information prediction according to the prediction model.

In one embodiment, calculating a predicted location thermodynamic diagram corresponding to each predicted keypoint location includes:

each predicted keypoint location is converted to a predicted location thermodynamic diagram using a gaussian function.

In one embodiment, performing model training according to thermodynamic diagrams of predicted positions and historical images to obtain a prediction model, includes:

acquiring a pre-cached image corresponding to each predicted key point position;

storing the predicted position thermodynamic diagrams corresponding to the positions of the predicted key points into corresponding pre-cached images;

splicing and fusing the pre-cached images stored with the predicted position thermodynamic diagrams to obtain a comprehensive predicted position thermodynamic diagram;

and carrying out model training according to the comprehensive predicted position thermodynamic diagram and the historical image to obtain a predicted model.

In one embodiment, performing model training according to the integrated predicted position thermodynamic diagram and the historical image to obtain a prediction model, includes:

selecting a latest historical frame image closest to the current time from the historical images of a plurality of frames;

acquiring a future frame image corresponding to the latest historical frame image and a comprehensive latest predicted position thermodynamic diagram corresponding to the future frame image;

inputting the comprehensive latest predicted position thermodynamic diagram and the latest historical frame image into a machine learning model to train the machine learning model to obtain predicted parameters, and obtaining an initial prediction model according to the predicted parameters;

the method comprises the steps of predicting a target historical image by using an initial prediction model to obtain a predicted target future frame image, calculating a deviation value of the predicted target future frame image and an actual target future frame image, continuously adjusting prediction parameters to obtain an adjusted prediction model when the deviation value is larger than a preset threshold value, stopping the prediction parameters until the deviation value between the predicted target future frame image and the actual predicted target future frame image predicted according to the adjusted prediction model is not larger than the preset threshold value, and taking the latest adjusted prediction model as a final prediction model.

In one embodiment, the target future frame image comprises at least one frame of target future image; after obtaining the target future frame image containing the image information prediction according to the prediction model, the method further comprises the following steps:

acquiring the actual image frame number contained in the target future frame image and the preset image frame number of the target future frame image;

comparing the actual number of image frames with the number of preset image frames;

and when the number of the actual image frames is less than the number of the preset image frames, taking the last frame image in the target future frame images as a target historical image, continuously inputting the target historical image into the prediction model to obtain a new target future frame image, and stopping obtaining the future frame image according to the target historical image until the number of the actual image frames is not less than the number of the preset image frames.

In one embodiment, obtaining a target future frame image containing image information prediction according to a prediction model includes:

and inputting the target historical images into a prediction model to obtain target future frame images with the same number as the preset number, wherein each target future frame image comprises image information prediction.

In one embodiment, predicting the predicted keypoint locations of the future frame image from the historical keypoint locations comprises:

and inputting the positions of the historical key points into a time sequence network model to obtain the positions of the predicted key points corresponding to the future frame images.

An image information prediction apparatus, comprising:

the historical image extraction module is used for extracting multiple frames of historical images from the historical video stream;

the historical key point extracting module is used for extracting historical key points in each historical image;

the historical position acquisition module is used for acquiring the historical key point positions corresponding to the historical key points;

a future position prediction module for predicting the predicted key point position of a future frame image according to each historical key point position, wherein the future frame image is the next frame image of the historical images in the video stream;

the thermodynamic diagram calculation module is used for calculating a predicted position thermodynamic diagram corresponding to each predicted key point position;

the training module is used for carrying out model training according to the thermodynamic diagrams of the predicted positions and the historical images to obtain a prediction model;

and the prediction module is used for extracting a target historical image from the target video stream, inputting the target historical image into the prediction model and obtaining a target future frame image containing the key point information according to the prediction model.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the method when the processor executes the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

The image information prediction method, the device, the computer equipment and the storage medium comprise the following steps: extracting historical images of a plurality of frames from a historical video stream; extracting historical key points in each historical image; acquiring historical key point positions corresponding to the historical key points; predicting the predicted key point positions of future frame images according to the historical key point positions, wherein the future frame images are the next frame images of the historical images in the video stream; calculating a predicted position thermodynamic diagram corresponding to each predicted key point position; performing model training according to the thermodynamic diagrams of the predicted positions and the historical images to obtain a prediction model; and extracting a target historical image from the target video stream, and inputting the target historical image into a prediction model so as to obtain a target future frame image containing the key point information according to the prediction model. In the process of training the prediction model, thermodynamic diagrams are used for training, more training information is introduced, the prediction model obtained through training has higher prediction accuracy, and image information obtained through prediction according to the prediction model is more accurate.

Drawings

FIG. 1 is a diagram illustrating an exemplary embodiment of a method for predicting image information;

FIG. 2 is a flow diagram illustrating a method for predicting image information according to one embodiment;

FIG. 3 is a flow chart illustrating a method for predicting image information according to another embodiment;

FIG. 4 is a block diagram showing the structure of an image information prediction apparatus according to an embodiment;

fig. 5 is an internal structural view of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The image information prediction method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The server 104 extracts a plurality of frames of historical images from the historical video stream; extracting historical key points in each historical image; acquiring historical key point positions corresponding to the historical key points; predicting the predicted key point positions of future frame images according to the historical key point positions, wherein the future frame images are the next frame images of the historical images in the video stream; calculating a predicted position thermodynamic diagram corresponding to each predicted key point position; performing model training according to the thermodynamic diagrams of the predicted positions and the historical images to obtain a prediction model; and extracting a target historical image from the target video stream, and inputting the target historical image into a prediction model so as to obtain a target future frame image containing the key point information according to the prediction model. Further, the server 104 may transmit the target future frame image to the terminal 102. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, an image information prediction method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

step 210, extracting a plurality of frames of historical images from the historical video stream.

The historical video stream is video stream data used to obtain the training network model. Specifically, the server acquires a historical video stream, such as video stream data acquired by a camera of the automatic driving mobile scene, and then extracts historical images of multiple frames from the video stream according to a preset frequency, and uses the historical images of the multiple frames as a historical image sequence. In another embodiment, the historical image acquired by the server may be an image in which meteorological elements are distributed in spatial positions in the meteorological field, or the like. Further, the server divides the acquired historical images into a training set, a testing set and a verification set.

Step 220, extracting historical key points in each historical image.

The key points are characteristic points in the image, such as data points representing the motion characteristics of the image and data points representing effective information of the image. In the field of automatic driving, the key points may be corner position points of a vehicle or skeleton position points of a pedestrian, and in the field of weather, the key points may be the strongest position points in a radar cloud image, position points with the largest edge change, center point positions, and the like.

In a specific implementation, the historical keypoints may be extracted from the historical image by using a pre-trained feature extraction Network, where the feature extraction Network may be any Network having a feature extraction function, such as a CNN (probabilistic Neural Network) Network that may be a jump link, a ragged Network, and the like.

Step 230, obtaining the historical key point position corresponding to each historical key point.

In one embodiment, after the feature extraction is performed on the historical image by using the multi-layer jump-linked CNN network to obtain the key points, the method further includes calculating historical key position information of each historical key point in the corresponding image, where the position information may be a coordinate value.

And 240, predicting the position of a predicted key point of a future frame image according to each historical key point position, wherein the future frame image is the next frame image of the historical images in the video stream.

Further, the extracted historical key point position information corresponding to each historical image is input into a position prediction neural network trained in advance, so that the prediction key points and the prediction key point positions in the future frame images are predicted according to the position prediction neural network, and if the prediction key point position coordinate information in the future frame images can be obtained in a prediction mode.

In one embodiment, predicting the predicted keypoint locations of the future frame image from the historical keypoint locations comprises: and inputting the positions of the historical key points into a time sequence network model to obtain the positions of the predicted key points corresponding to the future frame images.

Specifically, after the historical key point positions of the historical pictures are obtained, the historical key point positions are sequentially input into a time sequence network model such as an LSTM (Long Short-Term Memory) network model, and the predicted key point positions in the future frame images are obtained through prediction. It should be noted that the LSTM is a timing network, and in other embodiments, the LSTM may also be other networks with timing prediction capability, which is not limited herein.

It should be noted that the future frame image is a next frame image of the historical image in the video stream, and the number of the future frame images may include one or more, for example, the future frame image may be a next frame image in the video stream that is closest to the time of the historical image, or may be a future frame image of a plurality of frames in the video stream whose time difference from the historical image is within a preset range. That is, the number of future frame images predicted by using a timing network such as LSTM may be one or more, which is not limited herein.

And step 250, calculating a predicted position thermodynamic diagram corresponding to each predicted key point position.

Specifically, the server inputs the position information of the predicted key point of the future frame image into a gaussing module, and the gaussing module converts the position of the predicted key point into a predicted position thermodynamic diagram with a gaussed key point. Specifically, a computer program integrated in the gaussianization module may perform location-based gaussian expansion of the predicted keypoint locations and then obtain a predicted location thermodynamic diagram for the corresponding locations. The thermodynamic diagram can be understood as a probability diagram, that is, a gaussian formula is used to convert a key point into a key region, different positions in the key region correspond to different probabilities, and the probabilities represent importance levels at different positions, such as the probabilities representing the probabilities of the key positions at different positions.

And step 260, performing model training according to the thermodynamic diagrams of the predicted positions and the historical images to obtain a prediction model.

And taking the historical images and the predicted position thermodynamic diagrams obtained in the steps as a training set, and training a network model according to the training set to obtain a prediction model, so that the trained prediction model can predict a future image containing thermodynamic diagram information according to any input historical images.

In addition, the process of training the prediction model includes verifying the accuracy of the prediction model by using the verification set obtained in step 210, and only the model with the verified accuracy is extracted as the qualified prediction model.

And 270, extracting a target historical image from the target video stream, and inputting the target historical image into a prediction model to obtain a target future frame image containing the key point information according to the prediction model.

The target video stream is a video stream for which future frame images need to be predicted. Specifically, the server extracts one or more frames of target historical images from the target video stream, and then inputs the target historical images into a pre-trained prediction model, so that the prediction model analyzes and identifies the target historical images according to pre-trained prediction parameters to obtain key point information in corresponding target future frame images.

The video prediction can be applied to a plurality of scenes, such as the prediction of moving tracks of pedestrians, automobiles and the like in automatic driving; future field predictions of meteorological elements (temperature, humidity, wind speed, etc.); proximity prediction of radar returns, etc. According to the video prediction method based on the key point prediction, the thermodynamic diagram is used for training in the process of training the prediction model, so that the prediction model can obtain more priori knowledge, the accuracy of model acquisition is greatly improved, a more accurate prediction image can be predicted by using a model with higher accuracy, and particularly compared with other traditional methods, the accuracy of image prediction is improved.

In one embodiment, calculating a predicted location thermodynamic diagram corresponding to each predicted keypoint location includes: each predicted keypoint location is converted to a predicted location thermodynamic diagram using a gaussian function.

Specifically, a Gaussian function is embedded in the Gaussian module in advance, so that the predicted key point position is converted into a regionalized predicted position thermodynamic diagram according to the Gaussian function. In other embodiments, a pre-trained gaussian network model may be embedded in the gaussian module in advance, so as to implement gaussian for the predicted keypoint location according to the gaussian network model.

In this embodiment, the predicted key point position is converted into a predicted position thermodynamic diagram by using a gaussian function, so that point information is converted into region information, information expansion is realized, and the accuracy of the model can be improved by using the expanded information to perform model training.

In one embodiment, performing model training according to thermodynamic diagrams of predicted positions and historical images to obtain a prediction model, includes: acquiring a pre-cached image corresponding to each predicted key point position; storing the predicted position thermodynamic diagrams corresponding to the positions of the predicted key points into corresponding pre-cached images; splicing and fusing the pre-cached images stored with the predicted position thermodynamic diagrams to obtain a comprehensive predicted position thermodynamic diagram; and carrying out model training according to the comprehensive predicted position thermodynamic diagram and the historical image to obtain a predicted model.

Specifically, a gaussian formula is applied to each predicted key point position respectively, and the predicted key point position is converted into a predicted position thermodynamic diagram of a corresponding coordinate point. And initializing pre-cached images with the same number as the key points in the server, and then storing the predicted position thermodynamic diagrams corresponding to the key points into the corresponding pre-cached images, wherein the pre-cached images can be predefined blank images and are used for storing the predicted position thermodynamic diagrams. And then splicing and fusing the pre-cached images stored with the predicted position thermodynamic diagrams to generate a comprehensive predicted position thermodynamic diagram, and outputting the comprehensive predicted position thermodynamic diagram as the thermodynamic diagram of the module.

In this embodiment, each of the predicted position thermodynamic diagrams is stored in the corresponding pre-cached image, and then the pre-cached images are merged and fused to form a total output file, so that information such as the size and the position of each image does not need to be considered in the process of generating each of the predicted position thermodynamic diagrams, adaptability to the thermodynamic diagrams is improved, the size and the position of each of the predicted position thermodynamic diagrams can be adjusted in the process of combining the total predicted position thermodynamic diagrams to be suitable for generating the total predicted position thermodynamic diagrams, the generated comprehensive predicted position thermodynamic diagrams contain a plurality of pieces of key point information, the amount of information contained in the thermodynamic diagrams is increased, and accuracy of prediction model training by using the comprehensive predicted position thermodynamic diagrams is further improved.

In one embodiment, performing model training according to the integrated predicted position thermodynamic diagram and the historical image to obtain a prediction model, includes: selecting a latest historical frame image closest to the current time from the historical images of a plurality of frames; acquiring a future frame image corresponding to the latest historical frame image and a comprehensive latest predicted position thermodynamic diagram corresponding to the future frame image; and inputting the comprehensive latest predicted position thermodynamic diagram and the latest historical frame image into a machine learning model to train the machine learning model to obtain a predicted parameter, and obtaining an initial predicted model according to the predicted parameter.

Specifically, one frame of historical image closest to the current time is selected from multiple frames of historical images according to the time sequence, the historical image is used as the latest historical image, the comprehensive latest prediction position thermodynamic diagram of future frame images predicted according to the historical images is obtained, then the latest historical image and the latest prediction position thermodynamic diagram are used as training data to train a machine learning model to obtain an initial prediction model, and further, the initial training model is adjusted by using a verification set to obtain a final prediction model. It should be noted that one of the historical images may also be randomly selected from the multiple frames of historical images, and the randomly selected historical image and the latest predicted position thermodynamic diagram may be used as training data to perform model training.

Further, the process of adjusting the prediction parameters of the initial prediction model to obtain the prediction model with higher prediction accuracy further includes: the method comprises the steps of predicting a target historical image by using an initial prediction model to obtain a predicted target future frame image, calculating a deviation value of the predicted target future frame image and an actual target future frame image, continuously adjusting prediction parameters to obtain an adjusted prediction model when the deviation value is larger than a preset threshold value, stopping the prediction parameters until the deviation value between the predicted target future frame image and the actual predicted target future frame image predicted according to the adjusted prediction model is not larger than the preset threshold value, and taking the latest adjusted prediction model as a final prediction model.

In this embodiment, a predicted target future frame image is obtained according to an initial prediction model obtained through training, then deviation is obtained between the predicted future target frame image and an actual target future frame image, when the deviation is greater than a preset threshold, it is indicated that the accuracy of the initial prediction model is not enough, and a prediction parameter of the initial prediction model needs to be continuously adjusted to obtain a prediction model with higher prediction accuracy, until a deviation value between the predicted target future frame image and the actual predicted target future frame image obtained through prediction according to the adjusted prediction model is not greater than the preset threshold, stopping of the prediction parameter is stopped, and the latest adjusted prediction model is used as a final prediction model.

In a specific implementation, the prediction module takes the integrated picture (integrated prediction position thermodynamic diagram) output by the gaussianization module and a picture containing content, such as the last picture of the historical image sequence, as input to train to obtain a prediction model. The prediction module itself is a neural network, and in the present application, no special requirement is made on the network itself, and the network mainly functions to generate a corresponding prediction image, and specifically, the network may be a CNN, VAE, GAN network, or the like, which is capable of hopping links. Specifically, the description will be made by taking a self-encoder (VAE) as an example.

And splicing the input comprehensive predicted position thermodynamic diagram with the latest historical image containing the content to obtain input data, and inputting the input data to an encoder (CNN-based encoder) part. The encoder outputs an a priori distribution (which may be gaussian). And inputting the output of the encoder into a decoder part, and reconstructing a predicted image, wherein the predicted image comprises the key point information. And in the process of training the network, LOSS calculation is carried out on a prediction result graph output by the network and a real image, and back propagation is carried out, so as to train and predict network parameters, wherein the prediction model does not make special requirements on a LOSS function, and the LOSS function shown in formula (1) can be used:

wherein, in formula (1), MSE is used to calculate the mean square error loss function, T is the real image,

is a predicted image. It should be noted that the LOSS function corresponds to a network, such as a GAN (generic adaptive network) network, which corresponds to a LOSS function using GAN, and so on.

In this embodiment, the prediction parameters are continuously adjusted in the process of obtaining the final prediction model, so as to ensure that the precision of the finally obtained prediction model is within the preset range, further ensure the precision of the target future frame image obtained by prediction according to the prediction model, and improve the accuracy of image information prediction.

In one embodiment, the target future frame image comprises at least one frame of target future image; after obtaining the target future frame image containing the key point information according to the prediction model, the method further comprises the following steps: acquiring the actual image frame number contained in the target future frame image and the preset image frame number of the target future frame image; comparing the actual number of image frames with the number of preset image frames; and when the number of the actual image frames is less than the number of the preset image frames, taking the last frame image in the target future frame images as a target historical image, continuously inputting the target historical image into the prediction model to obtain a new target future frame image, and stopping obtaining the future frame image according to the target historical image until the number of the actual image frames is not less than the number of the preset image frames.

Specifically, the future frame image obtained by using the prediction model may be one or more images, and specifically, the server may obtain an actual number of image frames included in the target future frame image and a preset number of image frames of the target future frame image to be obtained; comparing the actual number of image frames with the number of preset image frames; when the number of the actual image frames is smaller than the number of the preset image frames, the target future frame image obtained through one-time model prediction does not meet the number requirement, so that the target future frame image needs to be obtained through prediction by using the prediction model. Specifically, the process of obtaining the target future frame image by using the prediction model again further includes: and when the target future frame image only comprises one image, taking the only one image as a target historical image, continuously inputting the only one image into the prediction model for prediction, continuously obtaining the target future frame image, then continuously comparing the size relation between the total number of the target future frame images obtained according to the prediction model and the preset image number, stopping the step of continuously predicting the image by using the prediction model when the total number is not less than the preset number, and otherwise, continuously inputting the target future frame image closest to the current time into the prediction model as the historical image and continuously predicting the image. In another embodiment, when the target future frame image obtained by using the prediction model includes a plurality of images, the last frame image closest to the current time may be used as the target history image, and the target history image may be continuously input into the prediction model to obtain a new target future frame image, and the obtaining of the future frame image according to the target history image is stopped until the actual number of image frames is not less than the preset number of image frames.

In this embodiment, a trained prediction model is used to predict a future image to obtain a prediction result of a future frame of image, and when the number of predicted images does not meet the requirement, a predicted image can be continuously used as a target historical image, that is, a true value, as an input of the prediction model to continue prediction, that is, the originally input first frame of target historical image is removed, and then the prediction model is reused to perform prediction, so as to finally obtain the prediction results of all future frames. In the embodiment, a preset number of future frame target images can be acquired according to the requirement, so that the applicability of the model is improved.

In another embodiment, obtaining a target future frame image containing key point information according to a prediction model includes: and inputting the target historical images into a prediction model to obtain target future frame images with the same number as the preset number, wherein each target future frame image comprises key point information.

In this embodiment, the number of images to be acquired may also be preset, so that target future frame images consistent with the preset number may be directly acquired according to the prediction model, and the preset number of images may be obtained through one-time image prediction, thereby improving the efficiency of acquiring image information.

In one embodiment, as shown in fig. 3, a schematic flow chart of predicting image information in another embodiment is provided, and specifically, the method includes:

step 310, inputting an image sequence module. Specifically, the server extracts a history image of a preset frame from the video stream as an input image sequence and inputs the history image to the image sequence module.

And step 320, a key point generating module. Specifically, the server inputs a historical image of a preset frame into a pre-trained CNN network to extract key points in each historical frame image and historical position coordinates of each historical key point; inputting the position coordinates of each historical key point into an LSTM time sequence network model for predicting the position coordinate information of the key points in the future image;

step 330, gaussianizing module. Inputting the position coordinates of each future key point into a Gaussian module to perform Gaussian processing to obtain a thermodynamic diagram of each future key point; and the thermodynamic diagrams of all key points are spliced to obtain a total thermodynamic diagram.

Step 340, a prediction module. Taking the future total thermodynamic diagram and the last frame of historical image as input data to train a prediction model; specifically, splicing the predicted total thermodynamic diagram and the last frame of historical diagram to obtain a spliced diagram, and inputting the spliced diagram to a coder based on a CNN model; and training the encoder to obtain a prediction model.

And step 350, outputting the picture module. Specifically, in specific application, a trained prediction model is used for predicting a future image to obtain a prediction result of the future frame of image.

Compared with the traditional method, the video prediction method based on the AI has the characteristics of accurate prediction effect, low calculation complexity and good real-time performance.

It should be understood that although the various steps in the flow charts of fig. 2-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-3 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in fig. 4, there is provided an image information prediction apparatus including:

a historical image extracting module 410, configured to extract multiple frames of historical images from the historical video stream.

And a historical key point extracting module 420, configured to extract historical key points in each historical image.

A historical position obtaining module 430, configured to obtain a historical key point position corresponding to each historical key point.

A future position prediction module 440, configured to predict a predicted keypoint position of a future frame image according to each historical keypoint position, where the future frame image is a next frame image of the historical images in the video stream.

And the thermodynamic diagram calculating module 450 is configured to calculate a predicted position thermodynamic diagram corresponding to each predicted key point position.

And the training module 460 is configured to perform model training according to the thermodynamic diagrams of the predicted positions and the historical images to obtain a prediction model.

And the prediction module 470 is configured to extract a target history image from the target video stream, and input the target history image into the prediction model to obtain a target future frame image containing the key point information according to the prediction model.

In one embodiment, the thermodynamic diagram calculation module 450 includes:

and the conversion unit is used for converting the positions of the predicted key points into predicted position thermodynamic diagrams by utilizing a Gaussian function.

In one embodiment, the training module 460 includes:

the cache image acquisition unit is used for acquiring a pre-cache image corresponding to each predicted key point position;

the storage unit is used for storing the predicted position thermodynamic diagrams corresponding to the predicted key point positions into corresponding pre-cache images;

the fusion unit is used for splicing and fusing the pre-cached images in which the predicted position thermodynamic diagrams are stored to obtain a comprehensive predicted position thermodynamic diagram;

and the training unit is used for carrying out model training according to the comprehensive predicted position thermodynamic diagram and the historical image to obtain a predicted model.

In one embodiment, a training unit comprises:

the latest image selecting subunit is used for selecting the latest historical frame image closest to the current time from the historical images of a plurality of frames;

the latest thermodynamic diagram acquisition subunit is used for acquiring a future frame image corresponding to the latest historical frame image and a comprehensive latest predicted position thermodynamic diagram corresponding to the future frame image;

the training subunit is used for inputting the comprehensive latest predicted position thermodynamic diagram and the latest historical frame image into the machine learning model so as to train the machine learning model to obtain a predicted parameter and obtain an initial predicted model according to the predicted parameter;

and the adjusting subunit is used for predicting the target historical image by using the initial prediction model to obtain a predicted target future frame image, calculating a deviation value between the predicted target future frame image and the actual target future frame image, continuously adjusting the prediction parameters to obtain an adjusted prediction model when the deviation value is greater than a preset threshold value, stopping the prediction parameters until the deviation value between the predicted target future frame image predicted according to the adjusted prediction model and the actual predicted target future frame image is not greater than the preset threshold value, and taking the latest adjusted prediction model as a final prediction model.

In one embodiment, the image information prediction apparatus further includes:

the number acquisition module is used for acquiring the actual image frame number contained in the target future frame image and the preset image frame number of the target future frame image;

the comparison module is used for comparing the actual image frame number with the preset image frame number;

and the retraining module is used for taking the last frame image in the target future frame images as a target historical image when the number of the actual image frames is less than the number of the preset image frames, continuously inputting the target historical image into the prediction model to obtain a new target future frame image, and stopping obtaining the future frame image according to the target historical image when the number of the actual image frames is not less than the number of the preset image frames.

In one embodiment, the prediction module 470 includes:

and the one-time prediction unit is used for inputting the target historical images into the prediction model to obtain target future frame images with the same number as the preset number, and each target future frame image comprises image information prediction.

In one embodiment, the future position prediction module 440 includes:

and the future position prediction unit is used for inputting the positions of the historical key points into the time sequence network model to obtain the positions of the predicted key points corresponding to the future frame image.

For specific limitations of the image information prediction apparatus, reference may be made to the above limitations of the image information prediction method, which are not described herein again. The respective modules in the image information prediction apparatus may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing image information prediction data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an image information prediction method.

Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: extracting historical images of a plurality of frames from a historical video stream; extracting historical key points in each historical image; acquiring historical key point positions corresponding to the historical key points; predicting the predicted key point positions of future frame images according to the historical key point positions, wherein the future frame images are the next frame images of the historical images in the video stream; calculating a predicted position thermodynamic diagram corresponding to each predicted key point position; performing model training according to the thermodynamic diagrams of the predicted positions and the historical images to obtain a prediction model; and extracting a target historical image from the target video stream, and inputting the target historical image into a prediction model so as to obtain a target future frame image containing image information prediction according to the prediction model.

In one embodiment, the processor, when executing the computer program, further performs the step of calculating a predicted position thermodynamic diagram corresponding to each predicted keypoint position, to: each predicted keypoint location is converted to a predicted location thermodynamic diagram using a gaussian function.

In one embodiment, the processor, when executing the computer program, further performs the step of performing model training based on the thermodynamic diagrams of the respective predicted positions and the historical images to obtain the prediction model, and is configured to: acquiring a pre-cached image corresponding to each predicted key point position; storing the predicted position thermodynamic diagrams corresponding to the positions of the predicted key points into corresponding pre-cached images; splicing and fusing the pre-cached images stored with the predicted position thermodynamic diagrams to obtain a comprehensive predicted position thermodynamic diagram; and carrying out model training according to the comprehensive predicted position thermodynamic diagram and the historical image to obtain a predicted model.

In one embodiment, the processor, when executing the computer program, further performs the step of performing model training based on the integrated predicted location thermodynamic diagram and the historical image to obtain a predicted model, and is further configured to: selecting a latest historical frame image closest to the current time from the historical images of a plurality of frames; acquiring a future frame image corresponding to the latest historical frame image and a comprehensive latest predicted position thermodynamic diagram corresponding to the future frame image; inputting the comprehensive latest predicted position thermodynamic diagram and the latest historical frame image into a machine learning model to train the machine learning model to obtain predicted parameters, and obtaining an initial prediction model according to the predicted parameters; the method comprises the steps of predicting a target historical image by using an initial prediction model to obtain a predicted target future frame image, calculating a deviation value of the predicted target future frame image and an actual target future frame image, continuously adjusting prediction parameters to obtain an adjusted prediction model when the deviation value is larger than a preset threshold value, stopping the prediction parameters until the deviation value between the predicted target future frame image and the actual predicted target future frame image predicted according to the adjusted prediction model is not larger than the preset threshold value, and taking the latest adjusted prediction model as a final prediction model.

In one embodiment, the target future frame image comprises at least one frame of target future image; the processor when executing the computer program when carrying out the steps subsequent to the step of deriving the target future frame image comprising the image information prediction from the prediction model is further adapted to: acquiring the actual image frame number contained in the target future frame image and the preset image frame number of the target future frame image; comparing the actual number of image frames with the number of preset image frames; and when the number of the actual image frames is less than the number of the preset image frames, taking the last frame image in the target future frame images as a target historical image, continuously inputting the target historical image into the prediction model to obtain a new target future frame image, and stopping obtaining the future frame image according to the target historical image until the number of the actual image frames is not less than the number of the preset image frames.

In one embodiment, the processor, when executing the computer program, further performs the step of obtaining a target future frame image comprising the image information prediction from the prediction model, for: and inputting the target historical images into a prediction model to obtain target future frame images with the same number as the preset number, wherein each target future frame image comprises image information prediction.

In one embodiment, the processor when executing the computer program when carrying out the step of predicting the predicted keypoint locations of the future frame image from the historical keypoint locations is further configured to: and inputting the positions of the historical key points into a time sequence network model to obtain the positions of the predicted key points corresponding to the future frame images.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: extracting historical images of a plurality of frames from a historical video stream; extracting historical key points in each historical image; acquiring historical key point positions corresponding to the historical key points; predicting the predicted key point positions of future frame images according to the historical key point positions, wherein the future frame images are the next frame images of the historical images in the video stream; calculating a predicted position thermodynamic diagram corresponding to each predicted key point position; performing model training according to the thermodynamic diagrams of the predicted positions and the historical images to obtain a prediction model; and extracting a target historical image from the target video stream, and inputting the target historical image into a prediction model so as to obtain a target future frame image containing image information prediction according to the prediction model.

In one embodiment, the computer program when executed by the processor performs the step of calculating a predicted location thermodynamic diagram for each predicted keypoint location is further configured to: each predicted keypoint location is converted to a predicted location thermodynamic diagram using a gaussian function.

In one embodiment, the computer program when executed by the processor further performs the step of performing model training based on the thermodynamic diagrams of the respective predicted positions and the historical images to obtain the prediction model, and is further configured to: acquiring a pre-cached image corresponding to each predicted key point position; storing the predicted position thermodynamic diagrams corresponding to the positions of the predicted key points into corresponding pre-cached images; splicing and fusing the pre-cached images stored with the predicted position thermodynamic diagrams to obtain a comprehensive predicted position thermodynamic diagram; and carrying out model training according to the comprehensive predicted position thermodynamic diagram and the historical image to obtain a predicted model.

In one embodiment, the computer program when executed by the processor performs the step of model training based on the integrated predicted location thermodynamic diagram and the historical images to obtain the prediction model is further configured to: selecting a latest historical frame image closest to the current time from the historical images of a plurality of frames; acquiring a future frame image corresponding to the latest historical frame image and a comprehensive latest predicted position thermodynamic diagram corresponding to the future frame image; inputting the comprehensive latest predicted position thermodynamic diagram and the latest historical frame image into a machine learning model to train the machine learning model to obtain predicted parameters, and obtaining an initial prediction model according to the predicted parameters; the method comprises the steps of predicting a target historical image by using an initial prediction model to obtain a predicted target future frame image, calculating a deviation value of the predicted target future frame image and an actual target future frame image, continuously adjusting prediction parameters to obtain an adjusted prediction model when the deviation value is larger than a preset threshold value, stopping the prediction parameters until the deviation value between the predicted target future frame image and the actual predicted target future frame image predicted according to the adjusted prediction model is not larger than the preset threshold value, and taking the latest adjusted prediction model as a final prediction model.

In one embodiment, the target future frame image comprises at least one frame of target future image; the computer program when executed by the processor further performs the steps after deriving a target future frame image comprising a prediction of image information from the prediction model further to: acquiring the actual image frame number contained in the target future frame image and the preset image frame number of the target future frame image; comparing the actual number of image frames with the number of preset image frames; and when the number of the actual image frames is less than the number of the preset image frames, taking the last frame image in the target future frame images as a target historical image, continuously inputting the target historical image into the prediction model to obtain a new target future frame image, and stopping obtaining the future frame image according to the target historical image until the number of the actual image frames is not less than the number of the preset image frames.

In one embodiment, the computer program when executed by the processor performs the step of deriving a target future frame image comprising a prediction of image information from the prediction model further comprises: and inputting the target historical images into a prediction model to obtain target future frame images with the same number as the preset number, wherein each target future frame image comprises image information prediction.

In one embodiment, the computer program when executed by the processor further performs the step of predicting the predicted keypoint locations of the future frame image from the historical keypoint locations further for: and inputting the positions of the historical key points into a time sequence network model to obtain the positions of the predicted key points corresponding to the future frame images.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for predicting image information, the method comprising:

extracting historical key points in each historical image;

predicting a predicted keypoint location of a future frame image from each of the historical keypoint locations, wherein the future frame image is a next frame image of the historical image in the video stream;

and extracting a target historical image from a target video stream, and inputting the target historical image into the prediction model so as to obtain a target future frame image containing key point information according to the prediction model.

2. The method of claim 1, wherein the calculating a predicted location thermodynamic diagram for each of the predicted keypoint locations comprises:

and converting each predicted key point position into a predicted position thermodynamic diagram by utilizing a Gaussian function.

3. The method according to claim 1 or 2, wherein model training is performed according to each of the predicted position thermodynamic diagrams and the historical images to obtain a prediction model, and the method comprises the following steps:

storing the predicted position thermodynamic diagrams corresponding to the positions of the predicted key points into corresponding pre-cache images;

splicing and fusing the pre-cached images in which the predicted position thermodynamic diagrams are stored to obtain a comprehensive predicted position thermodynamic diagram;

4. The method of claim 3, wherein model training from the integrated predicted location thermodynamic diagram and the historical images results in a predictive model, comprising:

inputting the synthesized latest predicted position thermodynamic diagram and the latest historical frame image into a machine learning model to train the machine learning model to obtain a predicted parameter, and obtaining an initial predicted model according to the predicted parameter;

and predicting the target historical image by using the initial prediction model to obtain a predicted target future frame image, calculating a deviation value of the predicted target future frame image and an actual target future frame image, continuously adjusting the prediction parameters to obtain an adjusted prediction model when the deviation value is greater than a preset threshold value, stopping the prediction parameters until the deviation value between the predicted target future frame image predicted by the adjusted prediction model and the actual predicted target future frame image is not greater than the preset threshold value, and taking the latest adjusted prediction model as a final prediction model.

5. The method according to claim 1, wherein the target future frame image comprises at least one frame of target future image; after obtaining the target future frame image containing the key point information according to the prediction model, the method further comprises the following steps:

comparing the actual image frame number with the preset image frame number;

6. The method according to claim 1, wherein said deriving a target future frame image containing keypoint information according to said prediction model comprises:

and inputting the target historical images into the prediction model to obtain target future frame images with the same number as the preset number, wherein each target future frame image comprises key point information.

7. The method of claim 1, wherein predicting the predicted keypoint locations of future frame images from each of said historical keypoint locations comprises:

8. An image information prediction apparatus, characterized in that the apparatus comprises:

a historical position obtaining module, configured to obtain a historical key point position corresponding to each historical key point;

a future position prediction module for predicting a predicted keypoint position of a future frame image according to each of the historical keypoint positions, wherein the future frame image is a next frame image of the historical images in the video stream;

and the prediction module is used for extracting a target historical image from a target video stream, inputting the target historical image into the prediction model and obtaining a target future frame image containing key point information according to the prediction model.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.