CN113705311A

CN113705311A - Image processing method and apparatus, storage medium, and electronic apparatus

Info

Publication number: CN113705311A
Application number: CN202110361952.6A
Authority: CN
Inventors: 裴翰奇; 常健博; 王任直; 冯铭; 姚建华; 尚鸿; 王晓宁; 郑瀚; 陈星翰
Original assignee: Tencent Technology Shenzhen Co Ltd; Peking Union Medical College Hospital Chinese Academy of Medical Sciences
Current assignee: Tencent Technology Shenzhen Co Ltd; Peking Union Medical College Hospital Chinese Academy of Medical Sciences
Priority date: 2021-04-02
Filing date: 2021-04-02
Publication date: 2021-11-26

Abstract

The invention discloses an image processing method and device, a storage medium and an electronic device. Wherein, the method comprises the following steps: acquiring a target image of a target object, wherein the target image comprises a local image area where a target event occurs; inputting a target image into an event occurrence time prediction model, and extracting image features of the target image through a first convolution layer in the event occurrence time prediction model to obtain local image features matched with local image regions; inputting the local image characteristics to a second convolution layer of the event occurrence time prediction model to obtain target global characteristics output by the second convolution layer; and determining event occurrence time information of the target object when the target event occurs based on the target global features. The invention solves the technical problem that the prediction result of the actual occurrence time of the event in the related technology is not accurate enough.

Description

Image processing method and apparatus, storage medium, and electronic apparatus

Technical Field

The present invention relates to the field of computers, and in particular, to an image processing method and apparatus, a storage medium, and an electronic apparatus.

Background

In daily life scenes, the actual occurrence time of events, such as the aging time of household appliances, the disease duration of patients, the growth time of plants, and the like, is often required to be predicted. The occurrence time of the event generally needs to depend on expert experience at present. For example, the aging duration of the home appliance needs to be estimated by a worker of the home appliance after checking devices of the home appliance, the attack duration of a patient generally needs to be estimated by a doctor according to an inspection report of the patient, and the growth duration of a plant needs to be estimated according to an experienced farmer or a researcher. The occurrence time of events in different fields needs to be predicted by means of the professional experts, and the requirement on the professional level of the experts is high. And the manual prediction mode has the problem of low efficiency.

Therefore, in the related art, the actual occurrence time of an event is predicted by means of expert experience, and due to the fact that reference factors based on the expert experience are not comprehensive enough or the expert experience has defects, the prediction result is not accurate enough.

Therefore, no effective solution to the above problems exists at present.

Disclosure of Invention

The embodiment of the invention provides an image processing method and device, a storage medium and an electronic device, which are used for at least solving the technical problem that the prediction result of the actual occurrence time of an event in the related art is not accurate enough.

According to an aspect of an embodiment of the present invention, there is provided an image processing method including: acquiring a target image of a target object, wherein the target image comprises a local image area where a target event occurs; inputting the target image into an event occurrence time prediction model, and performing image feature extraction on the target image through a first convolution layer in the event occurrence time prediction model to obtain local image features matched with the local image area, wherein the local image features are used for indicating the progress of the target object in the occurrence of the target event; inputting the local image features into a second convolutional layer of the event occurrence time prediction model to obtain target global features output by the second convolutional layer; and determining event occurrence time information of the target object when the target event occurs based on the target global features.

Optionally, inputting the target image into an event occurrence time prediction model, and performing image feature extraction on the target image through a first convolution layer in the event occurrence time prediction model to obtain a local image feature matching the local image region, including: identifying the local image area in the target image through the first convolution layer, wherein the local image features include image features of the local image area in multiple dimensions; and determining image features of the local image area in the plurality of dimensions by the first convolution layer, wherein the target global feature is used for representing the image features of the local image area in the plurality of dimensions, the plurality of dimensions are matched with the target event, and the image features in the plurality of dimensions are used for representing the progress of the target event of the target object.

Optionally, the determining event occurrence time information of the target object for the target event based on the target global feature includes: inputting the target global feature into a fully-connected layer of the event occurrence time prediction model, and obtaining a predicted occurrence time output by the fully-connected layer, wherein the predicted occurrence time represents a time between a time point when the target object starts to occur the target event and a time point when the target image is captured, and the event occurrence time information includes the predicted occurrence time.

Optionally, the determining event occurrence time information of the target object for the target event based on the target global feature includes: inputting the target global feature into a fully-connected layer of the event occurrence time prediction model to obtain a predicted occurrence time output by the fully-connected layer, wherein the predicted occurrence time represents a time between a time point when the target object starts to occur the target event and a time point when the target image is shot; determining a predicted occurrence time point based on the predicted occurrence time period and a pre-acquired shooting time point, wherein the shooting time point is a time point at which the target image is shot, the predicted occurrence time point indicates a time point at which the target object starts to occur the target event, and the event occurrence time information includes the predicted occurrence time.

Optionally, the inputting the target global feature into a fully-connected layer of the event occurrence time prediction model to obtain the predicted occurrence time output by the fully-connected layer includes: determining the prediction probability of each time length in a preset time length set through the full connection layer; and determining the predicted occurrence time length in the time length set according to the prediction probability of each time length.

Optionally, the method further includes: acquiring a sample image set, wherein the sample image set comprises a sample image and known event occurrence time information corresponding to the sample image, the sample image is an image obtained by shooting a sample object, and the known event occurrence time information comprises an actual time point of the sample object at which the target event occurs or a time length between the actual time point of the sample object at which the target event occurs and the time of shooting the sample image; and training a sample neural network model by using the sample image set until a target loss value of the sample neural network model meets a target convergence condition to obtain the event occurrence time prediction, wherein the target loss value is a loss value determined by the event occurrence time information output by the sample neural network model and the known event occurrence time information.

Optionally, the acquiring the sample image set includes: acquiring a first sample image, a first actual occurrence time point corresponding to the first sample image, a second sample image and a target time length, wherein the first sample image and the second sample image are images obtained by shooting a first sample object, the first actual occurrence time point represents an actual time point of the sample object at which the target event occurs, and the target time length represents a time length between a time point of shooting the first sample image and a time point of shooting the second sample image; determining a second actual occurrence time point corresponding to the second sample image according to the first actual occurrence time point and the target duration, wherein the second actual occurrence time point represents an actual time point of the sample object at which the target event occurs; the sample image set includes the first sample image and the first actual occurrence time point having a corresponding relationship, and the second sample image and the second actual occurrence time point having a corresponding relationship.

Optionally, the acquiring the sample image set includes: acquiring a third sample image and a third actual occurrence time point corresponding to the third sample image, wherein the third actual occurrence time point represents an actual time point of a second sample object at which the target event occurs; randomly overturning and/or rotating the third sample image to obtain a fourth sample image; determining an actual occurrence time point corresponding to the fourth sample image as the third actual occurrence time point; the sample image set includes the third sample image and the third actual occurrence time point having a corresponding relationship, and the fourth sample image and the third actual occurrence time point having a corresponding relationship.

According to another aspect of the embodiments of the present invention, there is also provided an image processing apparatus including: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a target image of a target object, and the target image comprises a local image area where a target event occurs; a first input module, configured to input the target image into an event occurrence time prediction model, and perform image feature extraction on the target image through a first convolution layer in the event occurrence time prediction model to obtain a local image feature matched with the local image region, where the local image feature is used to indicate a progress of the target object in occurrence of the target event; a second input module, configured to input the local image feature to a second convolutional layer of the event occurrence time prediction model, and obtain a target global feature output by the second convolutional layer; and the determining module is used for determining the event occurrence time information of the target object when the target event occurs based on the target global characteristics.

According to still another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned image processing method when running.

According to still another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, the memory having a computer program stored therein, the processor being configured to execute the image processing method described above through the computer program.

In the embodiment of the invention, the event occurrence time information of the target event is determined by adopting a mode of predicting the event occurrence time information of the target event by using the occurrence time prediction model and according to the local image characteristics of the target local image area in the target image through the occurrence time prediction model, so that the purpose of predicting the time of the target event of the target object is achieved, the technical effect of improving the accuracy of the predicted occurrence time of the event is realized, and the technical problem that the prediction result of the actual occurrence time of the event in the related technology is not accurate enough is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of an application environment of an alternative image processing method according to an embodiment of the invention;

FIG. 2 is a flow chart diagram of an image processing method according to an embodiment of the invention;

FIG. 3 is a schematic diagram of a component aging prediction process according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart illustrating patient episode length prediction according to an embodiment of the present invention;

FIG. 5 is a schematic illustration of a CT image of a brain according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a VggNet-19 network structure according to an embodiment of the invention;

FIG. 7 is a diagram of a ResNet-34 network architecture according to an embodiment of the present invention;

FIG. 8 is a first image of a diseased site at different times according to an embodiment of the present invention;

FIG. 9 is a second schematic diagram of the diseased portion image at different times of onset according to an embodiment of the present invention;

FIG. 10 is a first schematic view of metal surface features at different aging durations in accordance with an embodiment of the present invention;

FIG. 11 is a second schematic view of metal surface features at different aging durations in accordance with an embodiment of the present invention;

FIG. 12 is a schematic diagram of dimensional features of a neural network model acquiring a local image according to an embodiment of the invention;

FIG. 13 is a schematic diagram of the training of an event occurrence time prediction model according to an embodiment of the invention;

FIG. 14 is a schematic diagram of an alternative image processing apparatus according to an embodiment of the present invention;

fig. 15 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiments of the present invention, there is provided an image processing method, optionally, as an optional implementation manner, the image processing method may be but is not limited to be applied in a system environment as shown in fig. 1, where the system environment includes: terminal device 102, network 110, and server 112.

Optionally, in this embodiment, the terminal device 102 may be a terminal device with a shooting function, and may include, but is not limited to, at least one of the following: mobile phones (such as Android phones, iOS phones, etc.), notebook computers, tablet computers, palm computers, MID (Mobile Internet Devices), PAD, desktop computers, smart televisions, video cameras, B-mode ultrasound machines, X-ray film cameras, etc. The terminal device 102 includes a display 108 that can be used to display a captured image, a processor 106 that processes the image, and a memory 104 that stores data, including but not limited to image data. The network 110 may include, but is not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communication. The server 112 may be a single server, a server cluster composed of a plurality of servers, or a cloud server. The server includes a database 114 for storing data including, but not limited to, image data, model architecture and parameters of a model for predicting the time of occurrence of an event, and a processing engine 115. The processing engine is used for processing the image. The above is merely an example, and this is not limited in this embodiment.

Optionally, as an optional implementation, as shown in fig. 2, the image processing method includes:

step S202, acquiring a target image of a target object, wherein the target image comprises a local image area where a target event occurs;

step S204, inputting the target image into an event occurrence time prediction model, and extracting image features of the target image through a first convolution layer in the event occurrence time prediction model to obtain local image features matched with the local image area, wherein the local image features are used for showing the progress of the target object in the occurrence of the target event;

step S206, inputting the local image characteristics into a second convolution layer of the event occurrence time prediction model to obtain target global characteristics output by the second convolution layer;

step S208 is performed to determine event occurrence time information of the target object at which the target event occurs, based on the target global feature.

Through the steps, the event occurrence time information of the target event is determined by adopting a mode of predicting the event occurrence time information of the target event by using the occurrence time prediction model and according to the local image characteristics of the target local image area in the target image through the occurrence time prediction model, so that the purpose of predicting the time of the target event of the target object is achieved, the technical effect of improving the accuracy of the predicted occurrence time of the event is realized, and the technical problem that the prediction result of the actual occurrence time of the event in the related technology is not accurate enough is solved.

The image processing method can be applied to any scene for predicting the occurrence time of an event, including but not limited to: the aging time of the electrical equipment, the disease time of the patient, the growth time of the plant and the like. The event occurrence time information includes a time point at which the target event starts to occur and a time period from the time point at which the target event starts to occur to a time point at which the target image is captured.

As an optional implementation manner, taking prediction of the aging time period of the electrical apparatus as an example, the target object may be the electrical apparatus, and the aging time period or the time point of starting aging of the electrical apparatus may be predicted, where the aging time period is a time period from the time point of starting aging of the electrical apparatus to the time point of capturing the target image. The aging duration or the aging time point of a certain component in the electrical equipment can be predicted. The target event is the occurrence of equipment aging. In this embodiment, taking the prediction of the aging duration of the components in the electrical equipment as an example, the electrical equipment may be photographed, and the photographed image includes the aged components. Because some components are positioned specially in the electrical equipment, it is difficult to shoot only the aged components. The shot image may include other parts in the electrical equipment, and image features of the aged component need to be extracted, and the local image area may be an image of the aged component. The event occurrence time prediction model can be a VggNet, ResNet and other neural network models, training data can be used for training the VggNet neural network models, the training data can be aging components, and the components in the training data are the same, such as mainboard circuits. The aging time of the components may be different, for example, a 1 month aged component image, a half year aged component, a 1 year aged component, and the like. The training data may be selected according to actual situations, and the above examples are only for illustrating the present embodiment and are not limited thereto.

As shown in fig. 3, which is a schematic diagram of a component aging prediction process according to an alternative embodiment of the present invention, predicting an aging duration of a component in an electrical device may include the following steps:

and step S31, acquiring images of the components with different aging degrees, and marking the aging duration of the electric devices corresponding to the images. The components are taken as the mainboard circuit as an example, the mainboard circuits with different aging degrees are shot to obtain a group of mainboard circuit images with different aging degrees, the mainboard circuit images with different aging degrees can be marked according to expert experience, and the aging duration of the components in the images is marked.

And step S32, training the initial neural network model by taking the images of the element elements with different aging degrees and the corresponding aging duration as training data to obtain an event occurrence time prediction model. The event occurrence time prediction model can be a deep neural network model, for example, vggNet, ResNet and the like, the network model is trained by using training data, model parameters are repeatedly adjusted in the training process until a loss function between the aging duration output by the model and the aging duration of a known component in the training data meets a preset convergence condition, the preset convergence condition can use a convergence function, the output value of the convergence function between the aging duration output by the model and the known aging duration of the component is in a preset range, and the training of the network model is stopped to obtain the event occurrence time prediction model.

And step S33, predicting the aging duration of the component by using the event occurrence time prediction model. For a component with unknown aging duration, the image of the component can be input into a trained event occurrence time prediction model. The first few convolutional layers of the event occurrence time prediction model can extract low-dimensional features and local features in an input 2D/3D image, and can obtain image features of local image regions, namely image features of aged components, such as aged area and color of the aged components. With the expansion of the receptive field, the deep convolutional layer further extracts abstract features and global features on the basis of image features output by the former convolutional layers, the second convolutional layer of the event occurrence time prediction model can be the later layers of the neural network model, the global features of the local image region can be obtained, the global features can be used for representing the image features of the local image region in multiple dimensions, and the multiple dimensions can be shapes, sizes, colors, positions and the like. The second volume of the base layer of the event occurrence time prediction model may use global features to represent image features of multiple dimensions such as shape, size, color, position, etc., that is, the global features may be used to represent image features of local image regions in multiple dimensions such as shape, size, color, position, etc. The image features of multiple dimensions such as shape, size, color, position, etc. can be used to indicate the occurrence progress of an event, for example, the larger the aging area, the longer the aging time of the component with the darker color.

In this embodiment, the neural network model is trained by using an image obtained by shooting the aging component, so as to obtain an event occurrence time prediction model. The method comprises the steps of extracting a local image area of an aged component through an event occurrence time prediction model, analyzing image characteristics of the local image area of the aged component, expressing the image characteristics of the aged component in multiple dimensions such as shape, size and color through global characteristics, and predicting the aging duration of the component through the global characteristics. The problem of low efficiency caused by the fact that the aging time of the component needs to be predicted by means of expert experience in the prior art can be solved, and the technical effect of improving the aging time prediction efficiency of the component can be achieved through the embodiment.

As an alternative embodiment, the present application may also predict the time of onset of the patient or animal. The attack time may be an attack time period from a time point at which the attack is started to a time point at which the target image is captured, or may be a time point at which the attack is started. Taking the prediction of the disease duration of a patient as an example, in an actual medical scene, an important reference information when a doctor makes a treatment scheme is the specific disease duration of the patient. The duration of the disease not only affects the treatment difficulty and the operation risk of the patient, but also affects the prognosis condition of the patient. However, in some situations, the patient may lose consciousness or be accompanied by no one during the disease attack, so that the accurate disease attack duration cannot be given in the diagnosis and treatment process. In the embodiment, the method for predicting the disease onset time by using deep learning and artificial intelligence can predict the real disease onset time of a patient at a higher precision level, so that a doctor is assisted in making a customized medical scheme decision. The target object may be an organ of a patient, for example, a head, a chest, etc. The organ may be imaged by a conventional technique to obtain an image of the organ, for example, a CT image or a B-mode ultrasound image. Because the accurate position of the disease of the patient cannot be known under normal conditions, the whole organ of the disease part of the patient is usually shot, the image characteristics of the local image area of the disease part are extracted through the event occurrence time prediction model, the image characteristics of the local image area on multiple dimensions are represented through the global characteristics, the disease duration or the disease time point of the patient can be predicted based on the global characteristics, and the disease duration refers to the time interval from the disease beginning time of the patient to the image shooting.

Referring to fig. 4, which is a schematic flow chart illustrating a patient's onset length prediction method according to an alternative embodiment of the present invention, the step of predicting the patient's onset length may include the following steps:

and step S41, acquiring images of the diseased organs with different disease duration, and marking corresponding disease duration. The captured image may be a medical scan image, such as a CT image, and fig. 5 is a schematic diagram of a CT image of the brain according to an alternative embodiment of the present invention, in which the areas indicated by arrows S1, S2, and S3 are affected areas. In this embodiment, a number of medical scan images of a patient and corresponding patient episodes may be collected as a training data set and a validation data set of the network. The training data set is used for training and fitting the neural network, and the verification set is used for selecting the network model with the best generalization performance in the training process.

And step S42, performing data enhancement processing on the data set by using methods such as random inversion, rotation and the like, and enriching the diversity of data. Since the image data of the medical image category is often difficult to acquire, the amount of available data is generally small, and the network cannot be trained completely and effectively. In order to solve the problem, methods such as random inversion and rotation are used in the embodiment to perform data enhancement processing on the data set, so that the diversity of data is enriched, and the network training effect is improved.

In step S43, the deep learning network model is initialized. Before training the network model, initializing the network parameters by adopting a random initialization method.

In step S44, the model is trained using the training data set. When the input medical image is a 2D image (e.g., X-ray scanning), the input image can be processed by using neural network models such as vggtnet, ResNet, etc., as shown in fig. 6, which is a schematic diagram of a vggtnet-19 network structure according to an alternative embodiment of the present invention, and as shown in fig. 7, which is a schematic diagram of a ResNet-34 network structure according to an alternative embodiment of the present invention. When the input medical image is a 3D image (e.g., CT image, MRI), the 2D convolutional layer in the neural network may be replaced with a 3D convolutional layer, and the input image may be processed using a corresponding network structure such as 3D VggNet or 3D ResNet. In the neural network of this embodiment, the first several convolutional layers are used to extract low-dimensional features and local features in the input 2D/3D image; with the expansion of the receptive field, the deep convolutional layers further extract abstract features and global features on the basis of the output features of the former several convolutional layers. And finally, integrating the characteristics obtained by the last convolution layer through a full connection layer to obtain scalar output. During the training process, the input of the network is the medical scanning image of the patient, and the scalar finally output by the network corresponds to the real disease onset time of the patient. In this embodiment, MAE may be used as a loss function of the network, and Adam algorithm may be used as an optimization method of the network. After multiple iterations, after the model is basically fitted with training data, the network model with the best effect is selected as an event occurrence time prediction model based on a pre-prepared verification set.

And step S45, predicting the disease duration of the patient by using the trained event occurrence time prediction model. After training is finished, the 2D/3D medical scanning image corresponding to the patient with unknown disease duration can be input into the network based on the trained neural network model, and the output scalar of the last full-connection layer of the network is used as the disease duration predicted value of the patient. Specifically, after obtaining an input image, the network firstly extracts features in the image through a plurality of layers of convolutional layers, wherein the first layers of convolutional layers extract local features in the 2D/3D image, and the subsequent deep layers of convolutional layers further integrate global features with strong generalization on the basis of the local features; after extracting the features, the network finally integrates the feature vectors obtained by the last convolution layer by using a full connection layer, and outputs a numerical scalar result; the numerical result is a predicted value of the real disease duration of the patient by the network.

In the embodiment, the neural network model is trained by using the medical image of the patient, and the image with unknown disease duration is predicted by using the trained event occurrence time prediction model, so that the time for predicting the disease duration is saved, and the diagnosis and treatment efficiency of the doctor on the patient is improved.

As an alternative embodiment, the image acquired in the normal case may contain other parts of the target object. For example, when the aging duration of a component on the device is predicted, the component on the device is photographed, and the photographed image may include other parts of the device, in which case, a local image area of the aged component in the target image is required, and the local image area is an area of the aged component in the image. For example, since the affected site of the patient cannot be accurately predicted, it is necessary to photograph a certain range around the affected site of the patient, for example, the entire brain of the patient is photographed when the patient is headache, and the entire lung of the patient is photographed when the patient coughs. Most of the parts without diseases can appear in the shot film, and in order to better predict the disease duration of the parts with diseases, a local image area of the parts with diseases can be identified, wherein the local image area is the area where the parts with diseases are located.

As an alternative embodiment, the impact event occurrence duration may be a factor of multiple dimensions, which may be size, color, location, shape, and the like. As shown in fig. 8 and 9, the image of the affected part at different times is shown schematically, and the areas pointed by arrows S1, S2 and S3 are the affected areas, and it can be seen from the figure that the longer the time is, the larger the area of the affected part is, the darker the color is, the closer the affected part is to the important organ position of the patient, and the more the shape of the affected part extends to other parts. Therefore, the image characteristics of the diseased part can be analyzed by combining multiple dimensions such as the area size, the diseased position, the shape, the color and the like of the diseased part, and the accuracy of the prediction of the disease onset time of the patient can be improved.

As an alternative, the factors that typically affect the length of time the device ages may be size, color, etc. For example, for a metal component, the longer the aging time is, the larger the oxidation range is, for example, fig. 10 and 11 are schematic diagrams of metal surface features with different aging time periods, and it can be seen from the diagrams that the longer the aging time is, the larger the oxidized area of the metal surface is, and the darker the color is. Therefore, the size dimension and the color dimension can be used as the basis for judging the aging duration of the metal device. As an alternative embodiment, the factors influencing the duration of the disease of the patient can be the area, color, position, shape and the like of the diseased part. In this embodiment, the accuracy of predicting the aging time of the component can be improved by analyzing multiple dimensions of the local image area in the shot image.

Optionally, the determining event occurrence time information of the target object for the target event based on the target global feature includes: inputting the target global feature into a fully-connected layer of the event occurrence time prediction model, and obtaining a predicted occurrence time output by the fully-connected layer, wherein the predicted occurrence time represents a time between a time point when the target object starts to occur the target event and a time point when the target image is shot, and the event occurrence time information includes the predicted occurrence time.

As an alternative embodiment, the event occurrence time information may be a time period from a time point at which the occurrence of the target event is started to a time point at which the target image is captured. And integrating the global features obtained by the last convolutional layer by the fully-connected layer of the event occurrence time prediction model to obtain a scalar output, wherein the scalar output is used for representing the output result of the event occurrence time prediction model, the predicted occurrence time is obtained through the prediction of the event occurrence time prediction model, and the predicted occurrence time is used for representing the time between the target object starting to occur the target event and shooting the target image. For example, the time period from the time point when the component with the aging event starts to age to the time point when the image including the aging component is captured. As another example, the time period between the point in time when the patient starts to develop the disease and the time when the image is taken. Assuming that the duration of the disease of the patient is predicted, the duration of the disease of the patient can be determined according to the area size, the color, the position and the shape of the diseased part of the patient, and the target global feature can be used for representing the features of the diseased area (local image area) in multiple dimensions such as the shape, the size, the color and the like. The target global features are input into a full-connection layer of the event occurrence time prediction model, and the occurrence time of the patient, namely the time between the time point when the patient starts to occur and the time point when the image is shot, can be predicted through the full-connection layer. Through this embodiment, can predict the time of the morbidity of patient, can assist the doctor to the diagnosis and treatment of patient, improve the diagnosis and treatment efficiency of medical treatment.

Optionally, the determining event occurrence time information of the target object for the target event based on the target global feature includes: inputting the target global feature into a full-link layer of the event occurrence time prediction model to obtain a predicted occurrence time output by the full-link layer, wherein the predicted occurrence time represents a time between a time point when the target object starts to occur the target event and a time point when the target image is shot; determining a predicted occurrence time point based on the predicted occurrence time period and a pre-acquired shooting time point, wherein the shooting time point is a time point at which the target image is shot, the predicted occurrence time point indicates a time point at which the target object starts to occur the target event, and the event occurrence time information includes the predicted occurrence time.

As an alternative embodiment, the event occurrence time information may be a time point at which the occurrence of the target event is started. And integrating the global features obtained by the last convolutional layer by the fully-connected layer of the event occurrence time prediction model to obtain a scalar output, wherein the scalar output is used for representing the output result of the event occurrence time prediction model, the predicted occurrence time is obtained through the prediction of the event occurrence time prediction model, and the predicted occurrence time is used for representing the time between the target object starting to occur the target event and shooting the target image. For example, the time period from the time point when the component with the aging event starts to age to the time point when the image including the aging component is captured. As another example, the time period between the point in time when the patient starts to develop the disease and the time when the image is taken. The time point at which the target object starts to occur the target event may be determined based on the predicted occurrence time period. Assuming that the onset time point of the patient is predicted, the above target global feature can be used to represent the features of the affected area (local image area) in multiple dimensions of shape, size, color, and the like. The target global features are input into a full-connection layer of the event occurrence time prediction model, and the occurrence time of the patient, namely the time between the time point when the patient starts to occur and the time point when the image is shot, can be predicted through the full-connection layer. The times at which the images of the patient are acquired are known, for example, the acquisition times of the images are recorded on both CT images and B-ultrasound images. The time for starting the disease attack of the patient can be obtained based on the predicted time length and the shooting time point of the image, for example, the predicted time length of the disease attack of the patient is 3 days, the time point for shooting the image is 1 month and 4 days in 2020, and the time point for determining the disease attack of the patient is 1 month and 3 days in 2020.

As an alternative embodiment, the event occurrence time prediction model may identify local features of the image and extract multiple dimensions of the local features, such as size, shape, position, color, and the like. Fig. 12 is a schematic diagram of dimensional features of a local image obtained by a neural network model according to an alternative embodiment of the present invention, in which an image taken by a patient is input to the neural network model, the neural network model may extract low-dimensional features and local features in the input image to obtain the local image, and an event occurrence time prediction model may extract a plurality of dimensional image features in the local image, where T-1, T-2, and … T-N in the diagram may respectively represent a size, a shape, a position, a color, and the like. The neural network model integrates the characteristics T-1, T-2 and … T-N of multiple dimensions to obtain the global characteristic Ty of the local image, the local characteristics of the multiple dimensions T-1, T-2 and … T-N can be represented through the global characteristic Ty, and the actual occurrence time of time can be predicted according to the local characteristic Ty. In this embodiment, the image features may be extracted by the neural network model, and the occurrence time of the event may be predicted by the extracted image features, so that the accuracy may be improved. In this embodiment, the local image of the shot image can be extracted through the neural network model, the multidimensional characteristics of the local image are analyzed, the problem of low efficiency caused by the fact that the interested region needs to be manually marked in the prior art can be solved, the interested region in the image can be automatically extracted through the neural network model, and the efficiency of event occurrence time is improved.

As an alternative, as shown in fig. 12, the fully-connected layer in the neural network model may predict the occurrence time of the event according to the global features, and the obtained output results may be probability values of different time ranges, for example, Ty1' and Ty2' … Tym ' in fig. 12 may respectively represent different time durations, for example, Ty1' may represent one week, that is, the invention time is one week away from the time of capturing the image, Ty2' represents two weeks, Tym ' represents three weeks, and the neural network model may obtain probability values of Ty1' and Ty2' … Tym '. In this embodiment, the predicted occurrence time of the event is determined in a preset time length set, and different time length sets can be set for different training targets because the preset time length set is adopted in training. Assuming that the onset time is predicted, the probability of the onset time being Ty1' (one week) is 10%, the probability of the onset time being Ty2' (two weeks) is 60%, and the probability of the onset time being Ty3' (two weeks) is 30% obtained by the neural network model, the time with the highest probability value is determined as the onset time, that is, the onset time is two weeks, and if the shooting time of the patient is 3, 29 days in 2020, the actual onset time of the patient can be estimated as 3, 15 days. In this embodiment, the probability values of the patients at different attack durations can be obtained through the full-connection layer, the duration with the maximum probability value is selected as the attack duration, and the accuracy of predicting the actual occurrence duration of the event can be improved.

As an alternative embodiment, the initial neural network model may be trained using the sample image set, and a trained event occurrence time prediction model may be obtained through repeated model parameter adjustment. The predicted time of the output of the event occurrence time prediction model and the known time of the image satisfy a predetermined convergence condition, and the predetermined convergence condition may be that an output value of a convergence function of the predicted time of the output of the event occurrence time prediction model and the known time of the image is within a preset range. Fig. 13 is a schematic diagram of training an event occurrence time prediction model according to an alternative embodiment of the present invention, in which sample images P1, P2, … and Pi are input, and the sample images may be images of aged components or images captured of disease onset positions of a patient. The output of the model is the predicted duration Ty1', Ty2', …, Tyi ', and the actual duration of the sample is the known Ty1, Ty2, …, Tyi. Taking the prediction of the onset time of a patient as an example, the sample images P1, P2, … and Pi may be images of affected sites with known onset times, and the known onset times are Ty1, Ty2, … and Tyi according to the images P1, P2, … and Pi. Carrying out j round training on the initial neural network model by using the sample images P1, P2, … and Pi to obtain j round estimated attack time output by the j round neural network model, calculating a loss value between the j round estimated attack time and the known attack time output by the j round neural network model, stopping training to obtain an event occurrence time prediction model if the loss value meets a predetermined convergence condition, and carrying out j +1 round training until the loss value between the output result of the j +1 round neural network model and the known attack time meets the predetermined convergence condition if the loss value does not meet the predetermined convergence condition, wherein j is an integer. In this embodiment, the model is trained using effective training data, and the model parameters are adjusted through repeated training of the model to obtain an accurate prediction model, so that more accurate predicted occurrence time of the event can be obtained through the trained prediction model. In addition, the obtained event occurrence time prediction model can predict the actual occurrence time of the events, and the efficiency of predicting the event occurrence time can be improved.

As an alternative embodiment, because a large amount of training data is required to train the neural network model, some types of image data are often difficult to acquire, and the amount of available data is generally small, for example, the image data of medical image type. In this embodiment, the same sample object may be photographed at different times, and the event occurrence times of other photographing times may be calculated according to the event occurrence time of a certain photographing time, so that the actual event occurrence times corresponding to the images photographed by the sample at different times may be obtained, thereby increasing the data amount of the training data. For example, for a certain stroke patient, a first sample image P1 is obtained by shooting at a first time t1, and the actual attack duration of the first sample image is h days after being determined by a doctor. The cerebral arterial thrombosis patient takes a second sample image P2 at a second time t2, and the doctor does not see that the actual attack time corresponding to the second sample image is unknown. In this embodiment, the time interval f between the first time t1 and the second time t2 may be calculated as t2-t1, and the actual attack time length of the second sample image P2 may be calculated as f + h. By the method and the device, the data volume of the training data can be increased, and the accuracy of the model for predicting the actual occurrence time of the event is improved.

As an alternative embodiment, the third sample image is an image of a known event occurrence time point, for example, an image of a patient with a known disease occurrence time point, or an image of a component with a known aging duration. In order to increase the data amount of the training data, the third sample image may be subjected to image random inversion and rotation processing, and the actual event occurrence time point of the third sample image may be used as the actual event occurrence time point of the fourth sample image after the image processing. The third sample image and the corresponding event occurrence time point, and the fourth sample image and the corresponding actual event occurrence point are used as training data of the neural network model, so that the data volume of the training data can be increased, and the accuracy of the event occurrence time prediction model for predicting the event occurrence time is improved.

The application provides a brand-new method for predicting the actual occurrence time of an event, which can be applied to a prediction scene of the actual occurrence time of any event, for example, the aging duration of aging equipment can be predicted, the aging duration of equipment of any type can be predicted, and the aging duration of any component on the equipment can also be predicted. The method and the device can also be applied to the prediction of the disease attack time of a patient, and can be applied to any diseases, such as the disease attack time of cerebral arterial thrombosis, the disease attack time of cancer and the like. The method can provide effective reference information for the diagnosis and treatment process of medical care personnel under the condition that the morbidity time of a patient is unknown. According to the method and the device, the neural network model can self-learn local characteristics of the image, the region of interest can be automatically extracted, the region of interest does not need to be manually drawn, the method and the device are simple and convenient, the flexibility is strong, and various methods can be selected in the neural network model according to different requirements.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiments of the present invention, there is also provided an image processing apparatus for implementing the above-described image processing method. As shown in fig. 14, the apparatus includes: an obtaining module 1402, configured to obtain a target image of a target object, where the target image includes a local image area where a target event occurs; a first input module 1404, configured to input the target image into an event occurrence time prediction model, and perform image feature extraction on the target image through a first convolution layer in the event occurrence time prediction model to obtain a local image feature matching the local image region, where the local image feature is used to indicate a progress of the target object in occurrence of the target event; a second input module 1406, configured to input the local image features into a second convolutional layer of the event occurrence time prediction model, and obtain target global features output by the second convolutional layer; a determining module 1408, configured to determine event occurrence time information of the target object when the target event occurs based on the target global feature.

Optionally, the apparatus is further configured to input the target image into an event occurrence time prediction model, and perform image feature extraction on the target image through a first convolution layer in the event occurrence time prediction model to obtain a local image feature matching the local image region: identifying the local image area in the target image through the first convolution layer, wherein the local image features include image features of the local image area in multiple dimensions; and determining image features of the local image area in the plurality of dimensions by the first convolution layer, wherein the target global feature is used for representing the image features of the local image area in the plurality of dimensions, the plurality of dimensions are matched with the target event, and the image features in the plurality of dimensions are used for representing the progress of the target event of the target object.

Optionally, the apparatus is further configured to determine event occurrence time information of the target object when the target event occurs based on the target global feature by: inputting the target global feature into a fully-connected layer of the event occurrence time prediction model, and obtaining a predicted occurrence time output by the fully-connected layer, wherein the predicted occurrence time represents a time between a time point when the target object starts to occur the target event and a time point when the target image is captured, and the event occurrence time information includes the predicted occurrence time.

Optionally, the apparatus is further configured to determine event occurrence time information of the target object when the target event occurs based on the target global feature by: inputting the target global feature into a full-link layer of the event occurrence time prediction model to obtain a predicted occurrence time output by the full-link layer, wherein the predicted occurrence time represents a time between a time point when the target object starts to occur the target event and a time point when the target image is shot; and determining a predicted occurrence time point based on the predicted occurrence time length and a pre-acquired shooting time point, wherein the shooting time point is a time point for shooting the target image, the predicted occurrence time point represents a time point at which the target object starts to occur the target event, and the event occurrence time information comprises the predicted occurrence time.

Optionally, the apparatus is further configured to implement the inputting of the target global feature into a fully-connected layer of the event occurrence time prediction model, and obtain the predicted occurrence time output by the fully-connected layer by: determining the prediction probability of each time length in a preset time length set through the full connection layer; and determining the predicted occurrence time length in the time length set according to the prediction probability of each time length.

Optionally, the apparatus is further configured to acquire a sample image set, where the sample image set includes a sample image and known event occurrence time information corresponding to the sample image, the sample image is an image obtained by capturing a sample object, and the known event occurrence time information includes an actual time point of the sample object at which the target event occurs, or includes a time length between the actual time point of the sample object at which the target event occurs and a time at which the sample image is captured; and training a sample neural network model by using the sample image set until a target loss value of the sample neural network model meets a target convergence condition to obtain the event occurrence time prediction, wherein the target loss value is a loss value determined by the event occurrence time information output by the sample neural network model and the known event occurrence time information.

Optionally, the apparatus is further configured to implement the acquiring the sample image set by: acquiring a first sample image, a first actual occurrence time point corresponding to the first sample image, a second sample image and a target time length, wherein the first sample image and the second sample image are images obtained by shooting a first sample object, the first actual occurrence time point represents an actual time point of the sample object at which the target event occurs, and the target time length represents a time length between a time point of shooting the first sample image and a time point of shooting the second sample image; determining a second actual occurrence time point corresponding to the second sample image according to the first actual occurrence time point and the target duration, wherein the second actual occurrence time point represents an actual time point of the sample object at which the target event occurs; the sample image set includes the first sample image and the first actual occurrence time point having a corresponding relationship, and the second sample image and the second actual occurrence time point having a corresponding relationship.

Optionally, the apparatus is further configured to implement the acquiring the sample image set by: acquiring a third sample image and a third actual occurrence time point corresponding to the third sample image, wherein the third actual occurrence time point represents an actual time point of a second sample object at which the target event occurs; randomly overturning and/or rotating the third sample image to obtain a fourth sample image; determining an actual occurrence time point corresponding to the fourth sample image as the third actual occurrence time point; the sample image set includes the third sample image and the third actual occurrence time point having a corresponding relationship, and the fourth sample image and the third actual occurrence time point having a corresponding relationship.

According to still another aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the above image processing method, where the electronic device may be a terminal device or a server shown in fig. 1. The present embodiment takes the electronic device as a server as an example for explanation. As shown in fig. 15, the electronic device comprises a memory 1502, in which memory 1502 a computer program is stored, and a processor 1504 arranged to perform the steps of any of the above described method embodiments by means of the computer program.

Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, acquiring a target image of a target object, wherein the target image comprises a local image area where a target event occurs;

s2, inputting the target image into an event occurrence time prediction model, and performing image feature extraction on the target image through a first convolution layer in the event occurrence time prediction model to obtain a local image feature matching the local image region, wherein the local image feature is used to indicate the progress of the target object in generating the target event;

s3, inputting the local image feature into a second convolutional layer of the event occurrence time prediction model, and obtaining a target global feature output by the second convolutional layer;

s4, determining event occurrence time information of the target object at which the target event occurs based on the target global feature.

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 15 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 15 is a diagram illustrating a structure of the electronic device. For example, the electronics may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 15, or have a different configuration than shown in FIG. 15.

The memory 1502 may be used for storing software programs and modules, such as program instructions/modules corresponding to the image processing method and apparatus in the embodiments of the present invention, and the processor 1504 executes various functional applications and data processing by running the software programs and modules stored in the memory 1502, that is, implements the image processing method described above. The memory 1502 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 1502 can further include memory located remotely from the processor 1504, which can be coupled to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1502 may be used for, but not limited to, information such as a target image obtained by photographing a target object. As an example, as shown in fig. 15, the memory 1502 may include, but is not limited to, an obtaining module 1402, a first input module 1404, a second input module 1406, and a determining module 1408 of the image processing apparatus. In addition, other module units in the image processing apparatus may also be included, but are not limited to these, and are not described in detail in this example.

Optionally, the transmission device 1506 is used for receiving or transmitting data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 1506 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmission device 1506 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In addition, the electronic device further includes: a display 1508 for displaying a target image obtained by photographing the target object; and a connection bus 1510 for connecting the respective module parts in the above-described electronic apparatus.

In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. Nodes can form a Peer-To-Peer (P2P, Peer To Peer) network, and any type of computing device, such as a server, a terminal, and other electronic devices, can become a node in the blockchain system by joining the Peer-To-Peer network.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above. Wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An image processing method, comprising:

acquiring a target image of a target object, wherein the target image comprises a local image area where a target event occurs;

inputting the target image into an event occurrence time prediction model, and performing image feature extraction on the target image through a first convolution layer in the event occurrence time prediction model to obtain local image features matched with the local image regions, wherein the local image features are used for representing the progress of the target object in the occurrence of the target event;

inputting the local image features to a second convolution layer of the event occurrence time prediction model to obtain target global features output by the second convolution layer;

and determining event occurrence time information of the target object when the target event occurs based on the target global features.

2. The method according to claim 1, wherein inputting the target image into an event occurrence time prediction model, and performing image feature extraction on the target image through a first convolution layer in the event occurrence time prediction model to obtain local image features matching the local image regions comprises:

identifying the local image region in the target image through the first convolution layer, wherein the local image features include image features of the local image region in multiple dimensions;

determining image features of the local image area in the multiple dimensions through the first convolution layer, wherein the target global feature is used for representing the image features of the local image area in the multiple dimensions, the multiple dimensions are matched with the target event, and the image features in the multiple dimensions are used for representing the progress of the target event of the target object.

3. The method of claim 1, wherein the determining event occurrence time information of the target object at which the target event occurs based on the target global feature comprises:

inputting the target global feature into a full-link layer of the event occurrence time prediction model, and obtaining a predicted occurrence time output by the full-link layer, wherein the predicted occurrence time represents a time between a time point when the target object starts to occur the target event and a time point when the target image is shot, and the event occurrence time information includes the predicted occurrence time.

4. The method of claim 1, wherein the determining event occurrence time information of the target object at which the target event occurs based on the target global feature comprises:

inputting the target global features into a full-link layer of the event occurrence time prediction model, and obtaining predicted occurrence time output by the full-link layer, wherein the predicted occurrence time represents the time from a time point when the target object starts to occur the target event to a time point when the target image is shot;

determining a predicted occurrence time point based on the predicted occurrence time length and a pre-acquired shooting time point, wherein the shooting time point is a time point for shooting the target image, the predicted occurrence time point represents a time point at which the target object starts to occur the target event, and the event occurrence time information includes the predicted occurrence time.

5. The method according to claim 3 or 4, wherein the inputting the target global feature into a fully-connected layer of the event occurrence time prediction model to obtain the predicted occurrence time of the fully-connected layer output comprises:

determining the prediction probability of each time length in a preset time length set through the full connection layer;

and determining the predicted occurrence time length in the time length set according to the prediction probability of each time length.

6. The method according to any one of claims 1 to 4, further comprising:

acquiring a sample image set, wherein the sample image set comprises a sample image and known event occurrence time information corresponding to the sample image, the sample image is an image obtained by shooting a sample object, and the known event occurrence time information comprises an actual time point of the sample object at which the target event occurs or comprises a duration between the actual time point of the sample object at which the target event occurs and the time of shooting the sample image;

and training a sample neural network model by using the sample image set until a target loss value of the sample neural network model meets a target convergence condition to obtain the event occurrence time prediction, wherein the target loss value is a loss value determined by the event occurrence time information output by the sample neural network model and the known event occurrence time information.

7. The method of claim 6, wherein the obtaining the set of sample images comprises:

acquiring a first sample image, a first actual occurrence time point corresponding to the first sample image, a second sample image and a target duration, wherein the first sample image and the second sample image are images obtained by shooting a first sample object, the first actual occurrence time point represents an actual time point of the sample object at which the target event occurs, and the target duration represents a duration between a time point of shooting the first sample image and a time point of shooting the second sample image;

determining a second actual occurrence time point corresponding to the second sample image according to the first actual occurrence time point and the target duration, wherein the second actual occurrence time point represents an actual time point of the sample object at which the target event occurs;

wherein the sample image set includes the first sample image and the first actual occurrence time point having a correspondence relationship, and the second sample image and the second actual occurrence time point having a correspondence relationship.

8. The method of claim 6, wherein the obtaining the set of sample images comprises:

acquiring a third sample image and a third actual occurrence time point corresponding to the third sample image, wherein the third actual occurrence time point represents an actual time point of a second sample object at which the target event occurs;

randomly overturning and/or rotating the third sample image to obtain a fourth sample image;

determining an actual occurrence time point corresponding to the fourth sample image as the third actual occurrence time point;

wherein the sample image set includes the third sample image and the third actual occurrence time point having a correspondence relationship, and the fourth sample image and the third actual occurrence time point having a correspondence relationship.

9. An image processing apparatus characterized by comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a target image of a target object, and the target image comprises a local image area where a target event occurs;

a first input module, configured to input the target image into an event occurrence time prediction model, and perform image feature extraction on the target image through a first convolution layer in the event occurrence time prediction model to obtain a local image feature matched with the local image region, where the local image feature is used to indicate a progress of the target object in occurrence of the target event;

the second input module is used for inputting the local image features to a second convolutional layer of the event occurrence time prediction model to obtain target global features output by the second convolutional layer;

and the determining module is used for determining the event occurrence time information of the target object when the target event occurs based on the target global feature.

10. A computer-readable storage medium, comprising a stored program, wherein the program when executed performs the method of any one of claims 1 to 8.

11. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 8 by means of the computer program.