WO2021189911A1 - 基于视频流的目标物位置检测方法、装置、设备及介质 - Google Patents

基于视频流的目标物位置检测方法、装置、设备及介质 Download PDF

Info

Publication number
WO2021189911A1
WO2021189911A1 PCT/CN2020/131991 CN2020131991W WO2021189911A1 WO 2021189911 A1 WO2021189911 A1 WO 2021189911A1 CN 2020131991 W CN2020131991 W CN 2020131991W WO 2021189911 A1 WO2021189911 A1 WO 2021189911A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
image
position sequence
video stream
image set
Prior art date
Application number
PCT/CN2020/131991
Other languages
English (en)
French (fr)
Inventor
徐埌
陈超
侯怡卿
詹维伟
黄凌云
刘玉宇
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021189911A1 publication Critical patent/WO2021189911A1/zh

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device, electronic device, and computer-readable storage medium for detecting the position of a target based on a video stream.
  • neural network models have been widely used in medical image detection, such as target detection models (Single Shot MultiBox Detector, SSD for short).
  • target detection models Single Shot MultiBox Detector, SSD for short.
  • the neural network model has a good effect on most image detection scenarios, in the medical field, due to the lack of large-scale doctor-labeled data, the medical video detection often lacks the use of the image context information in the medical video. It will affect the accuracy of medical video detection.
  • the thyroid is mainly scanned by cross section and longitudinal section, and the entire scan video is recorded for the fragments where nodules may appear, and then the position of the thyroid nodule is judged.
  • the inventor It is realized that in this process, there is often a lack of utilization of the contextual information of the thyroid image in the thyroid video, which will affect the accuracy of the detection of the position of the thyroid video nodule.
  • the present application provides a method for detecting the position of a target based on a video stream, including:
  • the target image corresponding to the abnormal target position sequence obtains a standard target image set
  • the present application also provides a device for detecting the position of a target based on a video stream, the device including:
  • the framing module is used to obtain a video stream, and perform image framing on the video stream to obtain a framing image set;
  • the detection module is configured to use a pre-trained target area detection model to detect the target area of each sub-frame image in the sub-frame image set to obtain a target image set;
  • the recognition module is used to recognize the target position sequence of each target image in the target image set by using the pre-trained target position sequence recognition model, and screen out the abnormal target position sequence from the target position sequence. Deleting the target image corresponding to the abnormal target position sequence in the target image set to obtain a standard target image set;
  • the association module is used for image association of all target images in the standard target image set, and the position of the target in the video stream is identified according to the standard target image set after the image association.
  • an electronic device which includes:
  • At least one processor and,
  • a memory communicatively connected with the at least one processor; wherein,
  • the memory stores computer program instructions executable by the at least one processor, and the computer program instructions are executed by the at least one processor, so that the at least one processor can execute the following steps:
  • the target image corresponding to the abnormal target position sequence obtains a standard target image set
  • the present application also provides a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the following steps are implemented:
  • the target image corresponding to the abnormal target position sequence obtains a standard target image set
  • FIG. 1 is a schematic flowchart of a method for detecting the position of a target based on a video stream provided by an embodiment of the application;
  • FIG. 2 is a detailed flowchart of one of the steps in the method for detecting the position of a target based on a video stream provided in FIG. 1 in the first embodiment of the application;
  • FIG. 3 is a detailed flowchart of another step in the method for detecting the position of an object based on a video stream provided in FIG. 1 in the first embodiment of the application;
  • FIG. 4 is a detailed flowchart of another step in the method for detecting the position of an object based on a video stream provided in FIG. 1 in the first embodiment of the application;
  • FIG. 4 is a detailed flowchart of another step in the method for detecting the position of an object based on a video stream provided in FIG. 1 in the first embodiment of the application;
  • FIG. 5 is a schematic diagram of modules of a video stream-based target position detection device provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of the internal structure of an electronic device that implements a method for detecting the position of a target object based on a video stream provided by an embodiment of the application;
  • the embodiment of the present application provides a method for detecting the position of a target based on a video stream.
  • the execution subject of the method for detecting the position of a target object based on a video stream includes, but is not limited to, at least one of the electronic devices that can be configured to execute the method provided in the embodiments of the present application, such as a server and a terminal.
  • the method for detecting the position of a target based on a video stream may be executed by software or hardware installed on a terminal device or a server device, and the software may be a blockchain platform.
  • the server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, etc.
  • FIG. 1 for a schematic flowchart of a method for detecting a position of an object based on a video stream provided by an embodiment of the present application.
  • the method for detecting the position of a target based on a video stream includes:
  • the video stream is based on a video obtained by performing ultrasound scanning of the part to be detected.
  • the video stream is a thyroid video stream.
  • the video stream has a certain degree of continuity. If the target object position detection is performed on the entire video stream, such as thyroid position detection, the thyroid position detection result is likely to be inaccurate. Therefore, the embodiment of the present application performs imaging on the video stream.
  • the frame is divided into frames to obtain the framed image set, so as to perform target position detection on each frame of the image in the video stream, and improve the accuracy of target position detection in the video stream.
  • the image framing of the video stream to obtain a framing image set includes:
  • the total number of frames is obtained by checking the attributes of the corresponding video stream.
  • the multiple framed pictures are implemented through a while statement, for example, one frame is set as one picture through the while statement.
  • the picture format is jpg format.
  • the video stream may be stored in a blockchain node.
  • the target area detection model includes a YOLOv3 network, and the YOLOv3 network is used to detect the target area of an image.
  • the YOLOv3 network is used to detect a target in a framed image area.
  • the target area detection model includes: a convolutional layer, a pyramid pooling layer, a fusion layer, and so on.
  • the use of the pre-trained target area detection model to detect the target area of each sub-frame image in the sub-frame image set to obtain the target image set includes:
  • S20 Perform a convolution operation on the framed image by using the convolution layer to obtain a characteristic image
  • S23 Output the detection result of the target feature image by using the activation function of the target region detection model
  • the convolutional layer performs convolution operations on the image to achieve feature extraction
  • the pyramid pooling layer can perform size reduction operations on the feature image, which can avoid problems caused by cropping, scaling, etc. during image feature extraction
  • the fusion layer fuses the bottom layer features of the image into the extracted image features, which can reduce the influence of the image grayscale changes caused by different gains.
  • the bottom layer feature refers to the basic features in the framed image, such as color, length, width, etc.
  • the fusion in the embodiment of the present application passes through the fusion layer Implementation of the CSP (Cross-Stage-Partial-connections) module.
  • the activation function includes:
  • y′ i represents the detection result of the i-th target feature image
  • s represents the target feature image
  • the detection result includes: x, y, height, width, category, etc., where x and y represent the center point of the target feature image, and the category represents whether the target feature image is a target area, That is, category 0 indicates that it is not a target area, and category 1 indicates that it is a target area.
  • the target position sequence recognition model includes: a long short-term memory network (Long Short-Term Memory, LSTM) model, and the LSTM model is a time loop neural network that includes: input gate, Forgotten gate and output gate.
  • LSTM Long Short-Term Memory
  • the LSTM model is used to identify the target position sequence of each target image in the target image set to identify abnormal target images, thereby helping the user to better determine the nodules in the video stream
  • the distribution situation in turn, can improve the accuracy of target location detection based on the video stream.
  • the use of the pre-trained target position sequence recognition model to recognize the target position sequence of each target image in the target image set includes:
  • S32 Calculate the state update value of the target image according to the state value and the activation value
  • the calculation method of the state value includes:
  • i t represents a state value
  • w i represents the activation factor of the input gate
  • h t-1 represents the peak value of the target image at time t-1 of the input gate
  • x t represents the target image at time t
  • b i represents the input gate The weight of the cell unit in the middle.
  • the activation value calculation method includes:
  • f t represents the activation value
  • w f represents the activator of the forget gate
  • It represents the peak value of the target image at time t-1 of the forgetting gate
  • x t represents the target image input at time t
  • b f represents the weight of the cell unit in the forgetting gate.
  • the method for calculating the status update value includes:
  • c t represents the state update value
  • h t-1 represents the peak value of the target image at the input gate t-1
  • the calculation method of the initial position sequence includes:
  • o t represents the initial position sequence
  • tan h represents the activation function of the output gate
  • c t represents the state update value
  • the loss function is a softmax function, wherein the target image label refers to a sequence of target image positions marked by the user in the target image in advance. Further, in this application, the loss value is selected to be less than the predetermined A threshold initial position sequence is set as the target position sequence to filter out abnormal target images in the target image, and to improve the accuracy of subsequent target position detection based on the video stream.
  • an abnormal target position sequence is selected from the target position sequence, and the target image corresponding to the abnormal target position sequence is deleted from the target image set to obtain a standard target image set.
  • the target position sequence of the target area in the 80 target images is identified by the target position sequence model recognition model at the upper left of the corresponding target image
  • the target position sequence recognition model recognizes that the target position sequence of the target area in the 10 target images is at the upper right of the corresponding target image
  • the target position sequence recognition model recognizes that there is no target area in the 10 target images
  • the target position sequence can identify the target area corresponding to the position sequence in the upper right and no target area as an abnormal target area, so that the corresponding target image can be deleted, and the accuracy of the target position detection based on the video stream can be improved
  • the correct target position sequence of the target region in the target image can be identified according to the LSTM model.
  • all the standard target images in the standard target image set are image-associated, and the nodule position of the video stream is identified according to the standard target image set after image association.
  • the target correlation function is the currently known mean square error regression function. Based on the image correlation, the distribution of nodules in the video stream can be well identified, so as to help the user view the video stream The most prominent position of the nodule.
  • the embodiment of the present application first performs image framing on the acquired video stream to obtain a framed image set, so as to implement nodule detection on each frame of the video stream and improve the structure of the video stream. Section detection accuracy; secondly, the embodiment of the application detects the target area of each sub-frame image in the sub-frame image set to obtain the target image set, and recognizes the target position sequence of each target image in the target image set, and The target image corresponding to the abnormal target position sequence in the target position sequence is deleted to obtain a standard target image set, which makes good use of the context information of the target position sequence in the target image to identify the abnormal target position in the target image Sequence of target images, thereby improving the accuracy of nodule detection in the video stream; further, the embodiment of the present application associates all standard target images in the standard target image set according to the standard target images after the image is associated.
  • the position of the nodules of the video stream is identified, and the position distribution of the nodules existing in the video stream can be well viewed, thereby helping the user to find the most prominent nodules in the video stream. Therefore, the method for detecting the position of an object based on a video stream proposed in this application can improve the accuracy of detecting the position of an object based on the video stream.
  • FIG. 5 it is a functional block diagram of the device for detecting the position of an object based on a video stream of the present application.
  • the device 100 for detecting the position of an object based on a video stream described in this application can be installed in an electronic device.
  • the device for detecting the position of a target based on a video stream may include a framing module 101, a detection module 102, an identification module 103, and an association module 104.
  • the module described in the present invention can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can complete fixed functions, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the framing module 101 is configured to obtain a video stream, and perform image framing on the video stream to obtain a framed image set.
  • the video stream is based on a video obtained by performing ultrasound scanning of the part to be detected.
  • the video stream is a thyroid video stream.
  • the framing module 101 passes Image framing is performed on the video stream to obtain a framing image set, so as to implement target position detection for each frame of image in the video stream, and improve the accuracy of target position detection in the video stream.
  • the image framing is performed on the video stream to obtain a framing image set, and the framing module 101 is executed in the following manner:
  • Step I Query the total number of frames of the video stream
  • Step II Based on the total number of frames, divide the video stream into multiple framed pictures;
  • Step III Convert the multiple framed pictures into a picture format to obtain a framed image set.
  • the total number of frames is obtained by checking the attributes of the corresponding video stream.
  • the multiple framed pictures are implemented through a while statement, for example, one frame is set as one picture through the while statement.
  • the picture format is jpg format.
  • the framed image set may also be stored in a blockchain node.
  • the detection module 102 is configured to use a pre-trained target area detection model to detect the target area of each sub-frame image in the sub-frame image set to obtain a target image set.
  • the target area detection model includes a YOLOv3 network, and the YOLOv3 network is used to detect the target area of an image.
  • the YOLOv3 network is used to detect a target in a framed image area.
  • the target area detection model includes: a convolutional layer, a pyramid pooling layer, a fusion layer, and so on.
  • the pre-trained target area detection model is used to detect the target area of each sub-frame image in the sub-frame image set to obtain a target image set, and the detection module 102 executes in the following manner:
  • Step A Use the convolution layer to perform a convolution operation on the framed image to obtain a characteristic image
  • Step B Use the Pyramid Pooling (Spatial Pyramid Pooling, SPP for short) to perform a dimensionality reduction operation on the feature image to obtain a standard feature image;
  • Step C Use the fusion layer to fuse the bottom layer features of the framed image with the standard feature image to obtain a target feature image
  • Step D output the detection result of the target feature image by using the activation function of the target region detection model
  • Step E According to the detection result, filter out the framed images in the target area from the framed images to obtain a target image set.
  • the convolutional layer performs convolution operations on the image to achieve feature extraction
  • the pyramid pooling layer can perform size reduction operations on the feature image, which can avoid problems caused by cropping, scaling, etc. during image feature extraction
  • the fusion layer fuses the bottom layer features of the image into the extracted image features, which can reduce the influence of the image grayscale changes caused by different gains.
  • the bottom layer feature refers to the basic features in the framed image, such as color, length, width, etc.
  • the fusion in the embodiment of the present application passes through the fusion layer Implementation of the CSP (Cross Stage Partial) module.
  • the activation function includes:
  • y′ i represents the detection result of the i-th target feature image
  • s represents the target feature image
  • the detection result includes: x, y, height, width, category, etc., where x and y represent the center point of the target feature image, and the category represents whether the target feature image is a target area, That is, category 0 indicates that it is not a target area, and category 1 indicates that it is a target area.
  • the recognition module 103 is configured to use a pre-trained target position sequence recognition model to recognize the target position sequence of each target image in the target image set, and screen out an abnormal target position sequence from the target position sequence , Deleting the target image corresponding to the abnormal target position sequence from the target image set to obtain a standard target image set.
  • the target position sequence recognition model includes: a long short-term memory network (Long Short-Term Memory, LSTM) model, and the LSTM model is a time loop neural network that includes: input gate, Forgotten gate and output gate.
  • LSTM Long Short-Term Memory
  • the LSTM model is used to identify the target position sequence of each target image in the target image set to identify abnormal target images, thereby helping the user to better determine the nodules in the video stream
  • the distribution situation in turn, can improve the accuracy of target location detection based on the video stream.
  • the pre-trained target position sequence recognition model is used to recognize the target position sequence of each target image in the target image set, and the recognition module 103 is executed in the following manner:
  • Step a Calculate the state value of the target image through the input gate
  • Step b Calculate the activation value of the target image through the forget gate
  • Step c Calculate the state update value of the target image according to the state value and the activation value
  • Step d Use the output gate to calculate the initial position sequence of the state update value.
  • Step e Use the loss function in the target position sequence recognition model to calculate the loss value of the initial position sequence and the corresponding target image label, and select the initial position sequence whose loss value is less than a preset threshold to obtain the target object corresponding to the target image Position sequence.
  • the calculation method of the state value includes:
  • i t represents a state value
  • w i represents the activation factor of the input gate
  • h t-1 represents the peak value of the target image at time t-1 of the input gate
  • x t represents the target image at time t
  • b i represents the input gate The weight of the cell unit in the middle.
  • the activation value calculation method includes:
  • f t represents the activation value
  • w f represents the activator of the forget gate
  • It represents the peak value of the target image at time t-1 of the forgetting gate
  • x t represents the target image input at time t
  • b f represents the weight of the cell unit in the forgetting gate.
  • the method for calculating the status update value includes:
  • c t represents the state update value
  • h t-1 represents the peak value of the target image at the input gate t-1
  • the calculation method of the initial position sequence includes:
  • o t represents the initial position sequence
  • tan h represents the activation function of the output gate
  • c t represents the state update value
  • the loss function is a softmax function, wherein the target image label refers to a sequence of target image positions marked by the user in the target image in advance. Further, in this application, the loss value is selected to be less than the predetermined A threshold initial position sequence is set as the target position sequence to filter out abnormal target images in the target image, and to improve the accuracy of subsequent target position detection based on the video stream.
  • the recognition module 103 screens out an abnormal target position sequence from the target position sequence, and deletes the target image corresponding to the abnormal target position sequence from the target image set , Get the standard target image set.
  • the target position sequence of the target area in the 80 target images is identified by the target position sequence model recognition model at the upper left of the corresponding target image
  • the target position sequence recognition model recognizes that the target position sequence of the target area in the 10 target images is at the upper right of the corresponding target image
  • the target position sequence recognition model recognizes that there is no target area in the 10 target images
  • the target position sequence can identify the target area corresponding to the position sequence in the upper right and no target area as an abnormal target area, so that the corresponding target image can be deleted, and the accuracy of the target position detection based on the video stream can be improved
  • the correct target position sequence of the target region in the target image can be identified according to the LSTM model.
  • the associating module 104 is configured to associate all target images in the standard target image set with images, and identify the position of the target in the video stream according to the standard target image set after image association.
  • the associating module 104 associates all the standard target images in the standard target image set with images, and identifies the video stream according to the standard target image set after the images are associated. The position of the nodule.
  • the target correlation function is the currently known mean square error regression function. Based on the image correlation, the distribution of nodules in the video stream can be well identified, so as to help the user view the video stream The most prominent position of the nodule.
  • the embodiment of the present application first performs image framing on the acquired video stream to obtain a framed image set, so as to implement nodule detection on each frame of the video stream and improve the structure of the video stream. Section detection accuracy; secondly, the embodiment of the application detects the target area of each sub-frame image in the sub-frame image set to obtain the target image set, and recognizes the target position sequence of each target image in the target image set, and The target image corresponding to the abnormal target position sequence in the target position sequence is deleted to obtain a standard target image set, which makes good use of the context information of the target position sequence in the target image to identify the abnormal target position in the target image Sequence of target images, thereby improving the accuracy of nodule detection in the video stream; further, the embodiment of the present application associates all standard target images in the standard target image set according to the standard target images after the image is associated.
  • the device for detecting the position of an object based on a video stream proposed in the present application can improve the accuracy of detecting the position of an object based on the video stream.
  • FIG. 6 it is a schematic structural diagram of an electronic device that implements a method for detecting a position of an object based on a video stream according to the present application.
  • the electronic device 1 may include a processor 10, a memory 11, and a bus, and may also include a computer program stored in the memory 11 and running on the processor 10, such as a program for detecting the position of an object based on a video stream. 12.
  • the memory 11 may be volatile or non-volatile.
  • the memory 11 includes at least one type of readable storage medium.
  • the readable storage medium includes flash memory, mobile hard disk, and multimedia card.
  • Card-type memory for example: SD or DX memory, etc.
  • magnetic memory magnetic disk, optical disk, etc.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, for example, a mobile hard disk of the electronic device 1.
  • the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart media card (SMC), and a secure digital (Secure Digital) equipped on the electronic device 1.
  • SD Secure Digital
  • flash Card Flash Card
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can be used not only to store application software and various data installed in the electronic device 1, such as a code for detecting the position of an object based on a video stream, but also to temporarily store data that has been output or will be output.
  • the processor 10 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more Combinations of central processing unit (CPU), microprocessor, digital processing chip, graphics processor, and various control chips, etc.
  • the processor 10 is the control unit of the electronic device, which uses various interfaces and lines to connect the various components of the entire electronic device, and runs or executes programs or modules stored in the memory 11 (such as executing Target location detection based on the video stream, etc.), and call data stored in the memory 11 to execute various functions of the electronic device 1 and process data.
  • the bus may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus is configured to implement connection and communication between the memory 11 and at least one processor 10 and the like.
  • FIG. 6 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 6 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Components, or a combination of certain components, or different component arrangements.
  • the electronic device 1 may also include a power source (such as a battery) for supplying power to various components.
  • the power source may be logically connected to the at least one processor 10 through a power management device, thereby controlling power
  • the device implements functions such as charge management, discharge management, and power consumption management.
  • the power supply may also include one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, power supply status indicators and other arbitrary components.
  • the electronic device 1 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 1 may also include a network interface.
  • the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • the electronic device 1 may also include a user interface.
  • the user interface may be a display (Display) and an input unit (such as a keyboard (Keyboard)).
  • the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc.
  • the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
  • the video stream-based target position detection 12 stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When running in the processor 10, it can realize:
  • the target image corresponding to the abnormal target position sequence obtains a standard target image set
  • the integrated module/unit of the electronic device 1 may be stored in a computer readable storage medium, and the computer readable storage
  • the medium can be volatile or non-volatile.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain is essentially a decentralized database. It is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Pathology (AREA)
  • Medical Informatics (AREA)
  • Veterinary Medicine (AREA)
  • Heart & Thoracic Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • Physiology (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Fuzzy Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及人工智能技术,揭露了一种基于视频流的目标物位置检测方法,包括:获取视频流,对视频流进行图像分帧,得到分帧图像集;利用目标区域检测模型检测分帧图像集的目标区域,得到目标图像集;利用目标物位置序列识别模型识别目标图像集的目标物位置序列,根据目标物位置序列,从目标图像集中删除异常的目标物位置序列对应的目标图像,得到标准目标图像集;将标准目标图像集中所有的目标图像进行图像关联,根据图像关联后的标准目标图像,识别出目标物位置。此外,本申请还涉及区块链技术,所述视频流可存储于区块链中。本申请可以应用于对甲状腺结节的位置检测。本申请可以提高基于视频流的目标物位置检测的准确性。

Description

基于视频流的目标物位置检测方法、装置、设备及介质
本申请要求于2020年10月12日提交中国专利局、申请号为CN202011086228.9、名称为“基于视频流的目标物位置检测方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种基于视频流的目标物位置检测方法、装置、电子设备及计算机可读存储介质。
背景技术
近年来神经网络模型被广泛用于医疗图像检测,如目标检测模型(Single Shot MultiBox Detector,简称SSD)。神经网络模型虽然对于大多数图像检测场景都有着较好的效果,但是在医疗领域由于缺乏大规模医生标注的数据,在进行医疗视频检测时,往往缺乏对医疗视频中图像上下文信息的利用,从而会影响医疗视频检测的准确性。例如,在传统的甲状腺视频结节诊断时,主要通过对甲状腺进行横切,纵切扫描,并对整个扫描视频中可能出现结节的片段进行留图,再进行甲状腺结节位置判断,发明人意识到,在这过程中,往往缺乏对甲状腺视频中甲状腺图像上下文信息的利用,从而会影响甲状腺视频结节位置检测的准确性。
发明内容
本申请提供的一种基于视频流的目标物位置检测方法,包括:
获取视频流,对所述视频流进行图像分帧,得到分帧图像集;
利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集;
利用预训练的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,从所述目标物位置序列中筛选出异常的目标物位置序列,从所述目标图像集删除所述异常的目标物位置序列对应的目标图像,得到标准目标图像集;
将所述标准目标图像集中所有的目标图像进行图像关联,根据图像关联后的所述标准目标图像集,识别出所述视频流中的目标物位置。
本申请还提供一种基于视频流的目标物位置检测装置,所述装置包括:
分帧模块,用于获取视频流,对所述视频流进行图像分帧,得到分帧图像集;
检测模块,用于利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集;
识别模块,用于利用预训练的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,从所述目标物位置序列中筛选出异常的目标物位置序列,从所述目标图像集删除所述异常的目标物位置序列对应的目标图像,得到标准目标图像集;
关联模块,用于将所述标准目标图像集中所有的目标图像进行图像关联,根据图像关联后的所述标准目标图像集,识别出所述视频流中的目标物位置。
为了解决上述问题,本申请还提供一种电子设备,所述电子设备包括:
至少一个处理器;以及,
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的计算机程序指令,所述计算机程序指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如下步骤:
获取视频流,对所述视频流进行图像分帧,得到分帧图像集;
利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集;
利用预训练的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,从所述目标物位置序列中筛选出异常的目标物位置序列,从所述目标图像集删除所述异常的目标物位置序列对应的目标图像,得到标准目标图像集;
将所述标准目标图像集中所有的目标图像进行图像关联,根据图像关联后的所述标准目标图像集,识别出所述视频流中的目标物位置。
为了解决上述问题,本申请还提供一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现如下步骤:
获取视频流,对所述视频流进行图像分帧,得到分帧图像集;
利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集;
利用预训练的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,从所述目标物位置序列中筛选出异常的目标物位置序列,从所述目标图像集删除所述异常的目标物位置序列对应的目标图像,得到标准目标图像集;
将所述标准目标图像集中所有的目标图像进行图像关联,根据图像关联后的所述标准目标图像集,识别出所述视频流中的目标物位置。
附图说明
图1为本申请实施例提供的基于视频流的目标物位置检测方法的流程示意图;
图2为本申请第实施例中图1提供的基于视频流的目标物位置检测方法中其中一个步骤的详细流程示意图;
图3为本申请第实施例中图1提供的基于视频流的目标物位置检测方法中另外一个步骤的详细流程示意图;
图4为本申请第实施例中图1提供的基于视频流的目标物位置检测方法中另外一个步骤的详细流程示意图;
图5为本申请实施例提供的基于视频流的目标物位置检测装置的模块示意图;
图6为本申请实施例提供的实现基于视频流的目标物位置检测方法的电子设备的内部结构示意图;
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例提供一种基于视频流的目标物位置检测方法。所述基于视频流的目标物位置检测方法的执行主体包括但不限于服务端、终端等能够被配置为执行本申请实施例提供的该方法的电子设备中的至少一种。换言之,所述基于视频流的目标物位置检测方法可以由安装在终端设备或服务端设备的软件或硬件来执行,所述软件可以是区块链平台。所述服务端包括但不限于:单台服务器、服务器集群、云端服务器或云端服务器集群等。
参照图1所示的本申请一实施例提供的基于视频流的目标物位置检测方法的流程示意图。在本申请实施例中,所述基于视频流的目标物位置检测方法包括:
S1、获取视频流,对所述视频流进行图像分帧,得到分帧图像集。
本申请较佳实施例中,所述视频流是基于对待检测部位进行超声扫描得到的视频。其中,在本申请实施例中,所述视频流为甲状腺视频流。
应该了解,视频流具有一定的连续性,若对整个视频流进行目标物位置检测,如甲状 腺位置检测,容易导致甲状腺位置检测结果不准确,于是,本申请实施例通过对所述视频流进行图像分帧,得到分帧图像集,以实现对所述视频流中每一帧的图像进行目标物位置检测,提高所述视频流中目标物位置检测的准确性。
详细地,参阅图2所示,所述对所述视频流进行图像分帧,得到分帧图像集,包括:
S10、查询所述视频流的总帧数;
S11、基于所述总帧数,将所述视频流分割成多张分帧图片;
S12、将所述多张分帧图片转换为图片格式,得到分帧图像集。
一个优选实施例中,所述总帧数通过查看对应视频流的属性得到。
一个优选实施例中,所述多张分帧图片通过while语句实现,例如通过所述while语句设置1帧为一张图片。
一个优选实施例中,所述图片格式为jpg格式。
进一步地,为保障所述视频流的安全性和私密性,所述视频流可存储于一区块链节点中。
S2、利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集。
本申请较佳实施例中,所述目标区域检测模型包括YOLOv3网络,所述YOLOv3网络用于图像目标区域的检测,在本申请实施例中,所述YOLOv3网络用于检测分帧图像中的目标区域。
进一步地,所述目标区域检测模型包括:卷积层、金字塔池化层以及融合层等。
详细地,参阅图3所示,所述利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集包括:
S20、利用所述卷积层对所述分帧图像进行卷积操作,得到特征图像;
S21、利用所述金字塔池化层(Spatial Pyramid Pooling,简称SPP)对所述特征图像进行降维操作,得到标准特征图像;
S22、利用所述融合层将所述分帧图像的底层特征与所述标准特征图像进行融合,得到目标特征图像;
S23、利用所述目标区域检测模型的激活函数输出所述目标特征图像的检测结果;
S24、根据所述检测结果,从所述分帧图像中筛选出存在目标区域的分帧图像,得到目标图像集。
所述卷积层对图像进行卷积操作,可以实现特征提取,所述金字塔池化层可以对所述特征图像进行尺寸降维操作,可以避免在图像特征提取时因裁剪,缩放等问题引起的结节误检,所述融合层将图像的底层特征融合至提取的图像特征中,可以减小对不同增益引起的图像灰度变化影响。
一个优选实例中,所述底层特征指的是所述分帧图像中的基本特征,例如、颜色、长度、宽度等等,较佳地,本申请实施例中所述融合通过所述融合层中的CSP(Cross-Stage-Partial-connections跨阶段部分连接)模块实现。
一个优选实施例中,所述激活函数包括:
Figure PCTCN2020131991-appb-000001
其中,y′ i表示第i个目标特征图像的检测结果,s表示目标特征图像。
优选地,本申请较佳实施中,所述检测结果包括:x、y、高、宽以及类别等,其中,x、y表示目标特征图像的中心点,类别表示目标特征图像是否为目标区域,即类别0表示不是目标区域,类别1表示是目标区域。
S3、利用预训练的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,从所述目标物位置序列中筛选出异常的目标物位置序列,从所述目标图像集删 除所述异常的目标物位置序列对应的目标图像,得到标准目标图像集。
本申请较佳实施例中,所述目标物位置序列识别模型包括:长短期记忆网络(Long Short-Term Memory,LSTM)模型,所述LSTM模型是一种时间循环神经网络,包括:输入门、遗忘门以及输出门。
其中,在本申请中,所述LSTM模型用于识别出所述目标图像集中每个目标图像的目标物位置序列,以识别出异常目标图像,从而帮助用户更好的判断出视频流中结节分布情况,进而可以提高基于视频流的目标物位置检测的准确性。
详细地,参阅图4所示,所述利用预训练好的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,包括:
S30、通过所述输入门计算所述目标图像的状态值;
S31、通过所述遗忘门计算所述目标图像的激活值;
S32、根据所述状态值和激活值计算所述目标图像的状态更新值;
S33、利用所述输出门计算所述状态更新值的初始位置序列。
S34、利用所述目标物位置序列识别模型中的损失函数计算所述初始位置序列与对应目标图像标签的损失值,选取损失值小于预设阈值的初始位置序列,得到对应目标图像的目标物位置序列。
一个可选实施例中,所述状态值的计算方法包括:
Figure PCTCN2020131991-appb-000002
其中,i t表示状态值,
Figure PCTCN2020131991-appb-000003
表示输入门中细胞单元的偏置,w i表示输入门的激活因子,h t-1表示目标图像在输入门t-1时刻的峰值,x t表示在t时刻目标图像,b i表示输入门中细胞单元的权重。
一个可选实施例中,所述激活值的计算方法包括:
Figure PCTCN2020131991-appb-000004
其中,f t表示激活值,
Figure PCTCN2020131991-appb-000005
表示遗忘门中细胞单元的偏置,w f表示遗忘门的激活因子,
Figure PCTCN2020131991-appb-000006
表示目标图像在所述遗忘门t-1时刻的峰值,x t表示在t时刻输入的目标图像,b f表示遗忘门中细胞单元的权重。
一个可选实施例中,所述状态更新值的计算方法包括:
Figure PCTCN2020131991-appb-000007
其中,c t表示状态更新值,h t-1表示目标图像在输入门t-1时刻的峰值,
Figure PCTCN2020131991-appb-000008
表示目标图像在遗忘门t-1时刻的峰值。
一个可选实施例中,所述初始位置序列的计算方法包括:
o t=tan h(c t)
其中,o t表示初始位置序列,tan h表示输出门的激活函数,c t表示状态更新值。
一个可选实施例中,所述损失函数为softmax函数,其中,所述目标图像标签指的是用户预先在目标图像中标明的目标图像位置序列,进一步地,本申请中,选取损失值小于预设阈值的初始位置序列作为所述目标物位置序列,以筛选出所述目标图像中异常的目标图像,提高后续基于视频流的目标物位置检测的准确性。
进一步的,本申请实施例中,从所述目标物位置序列中筛选出异常的目标物位置序列,从所述目标图像集删除所述异常的目标物位置序列对应的目标图像,得到标准目标图像集。
示例性地,所述目标图像集存在100张目标图像,其中,通过所述目标物位置序列模型识别模型识别出80张目标图像中目标区域的目标物位置序列在对应目标图像的左上方,通过所述目标物位置序列识别模型识别出10张目标图像中目标区域的目标物位置序列在对应目标图像的右上方,通过所述目标物位置序列识别模型识别出10张目标图像中不存在目标区域的目标物位置序列,则可以识别出处于右上方和不存在目标区域的位置序列对应的目标区域为异常目标区域,从而可以删除对应的目标图像,提高基于视频流的目标物 位置检测的准确性,同时,需要声明的是,若一张目标图像中存在多个目标区域,则根据所述LSTM模型可以识别出该目标图像中目标区域的正确目标物位置序列。
S4、将所述标准目标图像集中所有的目标图像进行图像关联,根据图像关联后的所述标准目标图像集,识别出所述视频流中的目标物位置。
在本申请的至少一个实施例中,通过将所述标准目标图像集中所有的标准目标图像进行图像关联,根据图像关联后的所述标准目标图像集,识别出所述视频流的结节位置。
其中,所述目标关联函数为当前已知的均方误差回归函数,基于所述图像关联,可以很好的识别出所述视频流存在的结节分布情况,从而可以帮助用户查看到视频流中结节最显著的位置。
综上所述,本申请实施例首先对获取视频流进行图像分帧,得到分帧图像集,以实现对所述视频流中每一帧的图像进行结节检测,提高所述视频流的结节检测准确性;其次,本申请实施例检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集,并识别所述目标图像集中每个目标图像的目标物位置序列,及删除所述目标物位置序列中异常目标物位置序列对应的目标图像,得到标准目标图像集,很好的利用了目标图像中目标物位置序列的上下文信息,识别出目标图像中存在异常目标物位置序列的目标图像,从而提高了视频流中结节检测的准确性;进一步的,本申请实施例将所述标准目标图像集中所有的标准目标图像进行图像关联,根据图像关联后的所述标准目标图像集,识别出所述视频流的结节位置,可以很好的查看出视频流存在的结节位置分布情况,从而可以帮助用户查找到视频流中最显著的结节。因此,本申请提出的一种基于视频流的目标物位置检测方法可以提高基于视频流的目标物位置检测的准确性。
如图5所示,是本申请基于视频流的目标物位置检测装置的功能模块图。
本申请所述基于视频流的目标物位置检测装置100可以安装于电子设备中。根据实现的功能,所述基于视频流的目标物位置检测装置可以包括分帧模块101、检测模块102、识别模块103以及关联模块104。本发所述模块也可以称之为单元,是指一种能够被电子设备处理器所执行,并且能够完成固定功能的一系列计算机程序段,其存储在电子设备的存储器中。
在本实施例中,关于各模块/单元的功能如下:
所述分帧模块101,用于获取视频流,对所述视频流进行图像分帧,得到分帧图像集。
本申请较佳实施例中,所述视频流是基于对待检测部位进行超声扫描得到的视频。其中,在本申请实施例中,所述视频流为甲状腺视频流。
应该了解,视频流具有一定的连续性,若对整个视频流进行目标物位置检测,如甲状腺位置检测,容易导致甲状腺位置检测结果不准确,于是,本申请实施例,所述分帧模块101通过对所述视频流进行图像分帧,得到分帧图像集,以实现对所述视频流中每一帧的图像进行目标物位置检测,提高所述视频流中目标物位置检测的准确性。
详细地,所述对所述视频流进行图像分帧,得到分帧图像集,所述分帧模块101采用下述方式执行:
步骤I、查询所述视频流的总帧数;
步骤II、基于所述总帧数,将所述视频流分割成多张分帧图片;
步骤III、将所述多张分帧图片转换为图片格式,得到分帧图像集。
一个优选实施例中,所述总帧数通过查看对应视频流的属性得到。
一个优选实施例中,所述多张分帧图片通过while语句实现,例如通过所述while语句设置1帧为一张图片。
一个优选实施例中,所述图片格式为jpg格式。
进一步地,为保障所述分帧图像集的安全性和私密性,所述分帧图像集还可存储于一 区块链节点中。
所述检测模块102,用于利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集。
本申请较佳实施例中,所述目标区域检测模型包括YOLOv3网络,所述YOLOv3网络用于图像目标区域的检测,在本申请实施例中,所述YOLOv3网络用于检测分帧图像中的目标区域。
进一步地,所述目标区域检测模型包括:卷积层、金字塔池化层以及融合层等。
详细地,所述利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集,所述检测模块102采用下述方式执行:
步骤A、利用所述卷积层对所述分帧图像进行卷积操作,得到特征图像;
步骤B、利用所述金字塔池化层(Spatial Pyramid Pooling,简称SPP)对所述特征图像进行降维操作,得到标准特征图像;
步骤C、利用所述融合层将所述分帧图像的底层特征与所述标准特征图像进行融合,得到目标特征图像;
步骤D、利用所述目标区域检测模型的激活函数输出所述目标特征图像的检测结果;
步骤E、根据所述检测结果,从所述分帧图像中筛选出存在目标区域的分帧图像,得到目标图像集。
所述卷积层对图像进行卷积操作,可以实现特征提取,所述金字塔池化层可以对所述特征图像进行尺寸降维操作,可以避免在图像特征提取时因裁剪,缩放等问题引起的结节误检,所述融合层将图像的底层特征融合至提取的图像特征中,可以减小对不同增益引起的图像灰度变化影响。
一个优选实例中,所述底层特征指的是所述分帧图像中的基本特征,例如、颜色、长度、宽度等等,较佳地,本申请实施例中所述融合通过所述融合层中的CSP(Cross Stage Partial)模块实现。
一个优选实施例中,所述激活函数包括:
Figure PCTCN2020131991-appb-000009
其中,y′ i表示第i个目标特征图像的检测结果,s表示目标特征图像。
优选地,本申请较佳实施中,所述检测结果包括:x、y、高、宽以及类别等,其中,x、y表示目标特征图像的中心点,类别表示目标特征图像是否为目标区域,即类别0表示不是目标区域,类别1表示是目标区域。
所述识别模块103,用于利用预训练的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,从所述目标物位置序列中筛选出异常的目标物位置序列,从所述目标图像集删除所述异常的目标物位置序列对应的目标图像,得到标准目标图像集。
本申请较佳实施例中,所述目标物位置序列识别模型包括:长短期记忆网络(Long Short-Term Memory,LSTM)模型,所述LSTM模型是一种时间循环神经网络,包括:输入门、遗忘门以及输出门。
其中,在本申请中,所述LSTM模型用于识别出所述目标图像集中每个目标图像的目标物位置序列,以识别出异常目标图像,从而帮助用户更好的判断出视频流中结节分布情况,进而可以提高基于视频流的目标物位置检测的准确性。
详细地,所述利用预训练好的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,所述识别模块103采用下述方式执行:
步骤a、通过所述输入门计算所述目标图像的状态值;
步骤b、通过所述遗忘门计算所述目标图像的激活值;
步骤c、根据所述状态值和激活值计算所述目标图像的状态更新值;
步骤d、利用所述输出门计算所述状态更新值的初始位置序列。
步骤e、利用所述目标物位置序列识别模型中的损失函数计算所述初始位置序列与对应目标图像标签的损失值,选取损失值小于预设阈值的初始位置序列,得到对应目标图像的目标物位置序列。
一个可选实施例中,所述状态值的计算方法包括:
Figure PCTCN2020131991-appb-000010
其中,i t表示状态值,
Figure PCTCN2020131991-appb-000011
表示输入门中细胞单元的偏置,w i表示输入门的激活因子,h t-1表示目标图像在输入门t-1时刻的峰值,x t表示在t时刻目标图像,b i表示输入门中细胞单元的权重。
一个可选实施例中,所述激活值的计算方法包括:
Figure PCTCN2020131991-appb-000012
其中,f t表示激活值,
Figure PCTCN2020131991-appb-000013
表示遗忘门中细胞单元的偏置,w f表示遗忘门的激活因子,
Figure PCTCN2020131991-appb-000014
表示目标图像在所述遗忘门t-1时刻的峰值,x t表示在t时刻输入的目标图像,b f表示遗忘门中细胞单元的权重。
一个可选实施例中,所述状态更新值的计算方法包括:
Figure PCTCN2020131991-appb-000015
其中,c t表示状态更新值,h t-1表示目标图像在输入门t-1时刻的峰值,
Figure PCTCN2020131991-appb-000016
表示目标图像在遗忘门t-1时刻的峰值。
一个可选实施例中,所述初始位置序列的计算方法包括:
o t=tan h(c t)
其中,o t表示初始位置序列,tan h表示输出门的激活函数,c t表示状态更新值。
一个可选实施例中,所述损失函数为softmax函数,其中,所述目标图像标签指的是用户预先在目标图像中标明的目标图像位置序列,进一步地,本申请中,选取损失值小于预设阈值的初始位置序列作为所述目标物位置序列,以筛选出所述目标图像中异常的目标图像,提高后续基于视频流的目标物位置检测的准确性。
进一步的,本申请实施例中,所述识别模块103从所述目标物位置序列中筛选出异常的目标物位置序列,从所述目标图像集删除所述异常的目标物位置序列对应的目标图像,得到标准目标图像集。
示例性地,所述目标图像集存在100张目标图像,其中,通过所述目标物位置序列模型识别模型识别出80张目标图像中目标区域的目标物位置序列在对应目标图像的左上方,通过所述目标物位置序列识别模型识别出10张目标图像中目标区域的目标物位置序列在对应目标图像的右上方,通过所述目标物位置序列识别模型识别出10张目标图像中不存在目标区域的目标物位置序列,则可以识别出处于右上方和不存在目标区域的位置序列对应的目标区域为异常目标区域,从而可以删除对应的目标图像,提高基于视频流的目标物位置检测的准确性,同时,需要声明的是,若一张目标图像中存在多个目标区域,则根据所述LSTM模型可以识别出该目标图像中目标区域的正确目标物位置序列。
所述关联模块104,用于将所述标准目标图像集中所有的目标图像进行图像关联,根据图像关联后的所述标准目标图像集,识别出所述视频流中的目标物位置。
在本申请的至少一个实施例中,所述关联模块104通过将所述标准目标图像集中所有的标准目标图像进行图像关联,根据图像关联后的所述标准目标图像集,识别出所述视频流的结节位置。
其中,所述目标关联函数为当前已知的均方误差回归函数,基于所述图像关联,可以很好的识别出所述视频流存在的结节分布情况,从而可以帮助用户查看到视频流中结节最显著的位置。
综上所述,本申请实施例首先对获取视频流进行图像分帧,得到分帧图像集,以实现对所述视频流中每一帧的图像进行结节检测,提高所述视频流的结节检测准确性;其次,本申请实施例检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集,并识别所述目标图像集中每个目标图像的目标物位置序列,及删除所述目标物位置序列中异常目标物位置序列对应的目标图像,得到标准目标图像集,很好的利用了目标图像中目标物位置序列的上下文信息,识别出目标图像中存在异常目标物位置序列的目标图像,从而提高了视频流中结节检测的准确性;进一步的,本申请实施例将所述标准目标图像集中所有的标准目标图像进行图像关联,根据图像关联后的所述标准目标图像集,识别出所述视频流的结节位置,可以很好的查看出视频流存在的结节位置分布情况,从而可以帮助用户查找到视频流中最显著的结节。因此,本申请提出的一种基于视频流的目标物位置检测装置可以提高基于视频流的目标物位置检测的准确性。
如图6所示,是本申请实现基于视频流的目标物位置检测方法的电子设备的结构示意图。
所述电子设备1可以包括处理器10、存储器11和总线,还可以包括存储在所述存储器11中并可在所述处理器10上运行的计算机程序,如基于视频流的目标物位置检测程序12。
其中,所述存储器11可以是易失性的,也可以是非易失性的,所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如:SD或DX存储器等)、磁性存储器、磁盘、光盘等。所述存储器11在一些实施例中可以是电子设备1的内部存储单元,例如该电子设备1的移动硬盘。所述存储器11在另一些实施例中也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式移动硬盘、智能存储卡(Smart Media Card,SMC)、安全数字(Secure Digital,SD)卡、闪存卡(Flash Card)等。进一步地,所述存储器11还可以既包括电子设备1的内部存储单元也包括外部存储设备。所述存储器11不仅可以用于存储安装于电子设备1的应用软件及各类数据,例如基于视频流的目标物位置检测的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。
所述处理器10在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述处理器10是所述电子设备的控制核心(Control Unit),利用各种接口和线路连接整个电子设备的各个部件,通过运行或执行存储在所述存储器11内的程序或者模块(例如执行基于视频流的目标物位置检测等),以及调用存储在所述存储器11内的数据,以执行电子设备1的各种功能和处理数据。
所述总线可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器11以及至少一个处理器10等之间的连接通信。
图6仅示出了具有部件的电子设备,本领域技术人员可以理解的是,图6示出的结构并不构成对所述电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。
例如,尽管未示出,所述电子设备1还可以包括给各个部件供电的电源(比如电池),优选地,电源可以通过电源管理装置与所述至少一个处理器10逻辑相连,从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状 态指示器等任意组件。所述电子设备1还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。
进一步地,所述电子设备1还可以包括网络接口,可选地,所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备1与其他电子设备之间建立通信连接。
可选地,该电子设备1还可以包括用户接口,用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。
所述电子设备1中的所述存储器11存储的基于视频流的目标物位置检测12是多个指令的组合,在所述处理器10中运行时,可以实现:
获取视频流,对所述视频流进行图像分帧,得到分帧图像集;
利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集;
利用预训练的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,从所述目标物位置序列中筛选出异常的目标物位置序列,从所述目标图像集删除所述异常的目标物位置序列对应的目标图像,得到标准目标图像集;
将所述标准目标图像集中所有的目标图像进行图像关联,根据图像关联后的所述标准目标图像集,识别出所述视频流中的目标物位置。
具体地,所述处理器10对上述指令的具体实现方法可参考图1对应实施例中相关步骤的描述,在此不赘述。
进一步地,所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中,所述计算机可读取存储介质可以是易失性的,也可以是非易失性的,。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技 术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。

Claims (20)

  1. 一种基于视频流的目标物位置检测方法,其中,所述方法包括:
    获取视频流,对所述视频流进行图像分帧,得到分帧图像集;
    利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集;
    利用预训练的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,从所述目标物位置序列中筛选出异常的目标物位置序列,从所述目标图像集删除所述异常的目标物位置序列对应的目标图像,得到标准目标图像集;
    将所述标准目标图像集中所有的目标图像进行图像关联,根据图像关联后的所述标准目标图像集,识别出所述视频流中的目标物位置。
  2. 如权利要求1所述的基于视频流的目标物位置检测方法,其中,所述利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集,包括:
    利用所述目标区域检测模型的卷积层对所述分帧图像进行卷积操作,得到特征图像;
    利用所述目标区域检测模型的金字塔池化层对所述特征图像进行降维操作,得到标准特征图像;
    利用所述目标区域检测模型的融合层将所述分帧图像的底层特征与所述标准特征图像进行融合,得到目标特征图像;
    利用所述目标区域检测模型的激活函数输出所述目标特征图像的检测结果;
    根据所述检测结果,从所述分帧图像中筛选出存在目标区域的分帧图像,得到目标图像集。
  3. 如权利要求1所述的基于视频流的目标物位置检测方法,其中,所述利用预训练的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,包括:
    通过所述目标物位置序列识别模型的输入门计算所述目标图像的状态值;
    通过所述目标物位置序列识别模型的遗忘门计算所述目标图像的激活值;
    根据所述状态值和激活值计算所述目标图像的状态更新值;
    利用所述目标物位置序列识别模型的输出门计算所述状态更新值的初始位置序列;
    计算所述初始位置序列与对应目标图像标签的损失值,选取损失值小于预设阈值的初始位置序列,得到对应目标图像的目标物位置序列。
  4. 如权利要求3所述的基于视频流的目标物位置检测方法,其中,所述根据所述状态值和激活值计算所述目标图像的状态更新值,包括:
    利用下述方法计算所述状态更新值:
    Figure PCTCN2020131991-appb-100001
    其中,c t表示状态更新值,h t-1表示目标图像在输入门t-1时刻的峰值,
    Figure PCTCN2020131991-appb-100002
    表示目标图像在遗忘门t-1时刻的峰值。
  5. 如权利要求3所述的基于视频流的目标物位置检测方法,其中,所述利用所述目标物位置序列识别模型的输出门计算所述状态更新值的初始位置序列,包括:
    利用下述函数计算所述状态更新值的初始位置序列:
    o t=tan h(c t)
    其中,o t表示初始位置序列,tan h表示输出门的激活函数,c t表示状态更新值。
  6. 如权利要求1所述的基于视频流的目标物位置检测方法,其中,所述对所述视频流进行图像分帧,得到分帧图像集,包括:
    查询所述视频流的总帧数;
    基于所述总帧数,将所述视频流分割成多张分帧图片;
    将所述多张分帧图片转换为图片格式,得到分帧图像集。
  7. 如权利要求1至6中任意一项所述的基于视频流的目标物位置检测方法,其中,所述视频流为甲状腺视频流。
  8. 一种基于视频流的目标物位置检测装置,其中,所述装置包括:
    分帧模块,用于获取视频流,对所述视频流进行图像分帧,得到分帧图像集;
    检测模块,用于利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集;
    识别模块,用于利用预训练的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,从所述目标物位置序列中筛选出异常的目标物位置序列,从所述目标图像集删除所述异常的目标物位置序列对应的目标图像,得到标准目标图像集;
    关联模块,用于将所述标准目标图像集中所有的目标图像进行图像关联,根据图像关联后的所述标准目标图像集,识别出所述视频流中的目标物位置。
  9. 一种电子设备,其中,所述电子设备包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的计算机程序指令,所述计算机程序指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如下步骤:
    获取视频流,对所述视频流进行图像分帧,得到分帧图像集;
    利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集;
    利用预训练的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,从所述目标物位置序列中筛选出异常的目标物位置序列,从所述目标图像集删除所述异常的目标物位置序列对应的目标图像,得到标准目标图像集;
    将所述标准目标图像集中所有的目标图像进行图像关联,根据图像关联后的所述标准目标图像集,识别出所述视频流中的目标物位置。
  10. 如权利要求9所述的电子设备,其中,所述利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集,包括:
    利用所述目标区域检测模型的卷积层对所述分帧图像进行卷积操作,得到特征图像;
    利用所述目标区域检测模型的金字塔池化层对所述特征图像进行降维操作,得到标准特征图像;
    利用所述目标区域检测模型的融合层将所述分帧图像的底层特征与所述标准特征图像进行融合,得到目标特征图像;
    利用所述目标区域检测模型的激活函数输出所述目标特征图像的检测结果;
    根据所述检测结果,从所述分帧图像中筛选出存在目标区域的分帧图像,得到目标图像集。
  11. 如权利要求9所述的电子设备,其中,所述利用预训练的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,包括:
    通过所述目标物位置序列识别模型的输入门计算所述目标图像的状态值;
    通过所述目标物位置序列识别模型的遗忘门计算所述目标图像的激活值;
    根据所述状态值和激活值计算所述目标图像的状态更新值;
    利用所述目标物位置序列识别模型的输出门计算所述状态更新值的初始位置序列;
    计算所述初始位置序列与对应目标图像标签的损失值,选取损失值小于预设阈值的初始位置序列,得到对应目标图像的目标物位置序列。
  12. 如权利要求11所述的电子设备,其中,所述根据所述状态值和激活值计算所述目标图像的状态更新值,包括:
    利用下述方法计算所述状态更新值:
    Figure PCTCN2020131991-appb-100003
    其中,c t表示状态更新值,h t-1表示目标图像在输入门t-1时刻的峰值,
    Figure PCTCN2020131991-appb-100004
    表示目标图像在遗忘门t-1时刻的峰值。
  13. 如权利要求11所述的电子设备,其中,所述利用所述目标物位置序列识别模型的输出门计算所述状态更新值的初始位置序列,包括:
    利用下述函数计算所述状态更新值的初始位置序列:
    o t=tan h(c t)
    其中,o t表示初始位置序列,tan h表示输出门的激活函数,c t表示状态更新值。
  14. 如权利要求9所述的电子设备,其中,所述对所述视频流进行图像分帧,得到分帧图像集,包括:
    查询所述视频流的总帧数;
    基于所述总帧数,将所述视频流分割成多张分帧图片;
    将所述多张分帧图片转换为图片格式,得到分帧图像集。
  15. 如权利要求9至14中任意一项所述的电子设备,其中,所述视频流为甲状腺视频流。
  16. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下步骤:
    获取视频流,对所述视频流进行图像分帧,得到分帧图像集;
    利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集;
    利用预训练的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,从所述目标物位置序列中筛选出异常的目标物位置序列,从所述目标图像集删除所述异常的目标物位置序列对应的目标图像,得到标准目标图像集;
    将所述标准目标图像集中所有的目标图像进行图像关联,根据图像关联后的所述标准目标图像集,识别出所述视频流中的目标物位置。
  17. 如权利要求16所述的电子设备,其中,所述利用预训练的目标区域检测模型检测所述分帧图像集中每个分帧图像的目标区域,得到目标图像集,包括:
    利用所述目标区域检测模型的卷积层对所述分帧图像进行卷积操作,得到特征图像;
    利用所述目标区域检测模型的金字塔池化层对所述特征图像进行降维操作,得到标准特征图像;
    利用所述目标区域检测模型的融合层将所述分帧图像的底层特征与所述标准特征图像进行融合,得到目标特征图像;
    利用所述目标区域检测模型的激活函数输出所述目标特征图像的检测结果;
    根据所述检测结果,从所述分帧图像中筛选出存在目标区域的分帧图像,得到目标图像集。
  18. 如权利要求16所述的电子设备,其中,所述利用预训练的目标物位置序列识别模型识别所述目标图像集中每个目标图像的目标物位置序列,包括:
    通过所述目标物位置序列识别模型的输入门计算所述目标图像的状态值;
    通过所述目标物位置序列识别模型的遗忘门计算所述目标图像的激活值;
    根据所述状态值和激活值计算所述目标图像的状态更新值;
    利用所述目标物位置序列识别模型的输出门计算所述状态更新值的初始位置序列;
    计算所述初始位置序列与对应目标图像标签的损失值,选取损失值小于预设阈值的初始位置序列,得到对应目标图像的目标物位置序列。
  19. 如权利要求18所述的电子设备,其中,所述根据所述状态值和激活值计算所述目标图像的状态更新值,包括:
    利用下述方法计算所述状态更新值:
    Figure PCTCN2020131991-appb-100005
    其中,c t表示状态更新值,h t-1表示目标图像在输入门t-1时刻的峰值,
    Figure PCTCN2020131991-appb-100006
    表示目标图像在遗忘门t-1时刻的峰值。
  20. 如权利要求18所述的电子设备,其中,所述利用所述目标物位置序列识别模型的输出门计算所述状态更新值的初始位置序列,包括:
    利用下述函数计算所述状态更新值的初始位置序列:
    o t=tan h(c t)
    其中,o t表示初始位置序列,tan h表示输出门的激活函数,c t表示状态更新值。
PCT/CN2020/131991 2020-10-12 2020-11-27 基于视频流的目标物位置检测方法、装置、设备及介质 WO2021189911A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011086228.9A CN112137591B (zh) 2020-10-12 2020-10-12 基于视频流的目标物位置检测方法、装置、设备及介质
CN202011086228.9 2020-10-12

Publications (1)

Publication Number Publication Date
WO2021189911A1 true WO2021189911A1 (zh) 2021-09-30

Family

ID=73952998

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/131991 WO2021189911A1 (zh) 2020-10-12 2020-11-27 基于视频流的目标物位置检测方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN112137591B (zh)
WO (1) WO2021189911A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114399633A (zh) * 2022-01-19 2022-04-26 北京石油化工学院 一种基于YOLOv5s模型的移动电子设备位置检测方法
CN114764786A (zh) * 2022-03-14 2022-07-19 什维新智医疗科技(上海)有限公司 一种基于超声视频流的病灶区域实时检测装置
CN115690615A (zh) * 2022-10-11 2023-02-03 杭州视图智航科技有限公司 一种面向视频流的深度学习目标识别方法及系统
CN116363557A (zh) * 2023-03-17 2023-06-30 杭州再启信息科技有限公司 一种用于连续帧的自学习标注方法、系统及介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112907660B (zh) * 2021-01-08 2022-10-04 浙江大学 面向小样本的水下激光目标检测仪
CN114951017B (zh) * 2022-05-12 2023-05-30 深圳市顺鑫昌文化股份有限公司 一种标签印刷在线智能检测报错系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2672424A1 (en) * 2012-06-08 2013-12-11 Realeyes OÜ Method and apparatus using adaptive face registration method with constrained local models and dynamic model switching
EP2672423A1 (en) * 2012-06-08 2013-12-11 Realeyes OÜ Method and apparatus for locating features of an object using deformable models
CN109859216A (zh) * 2019-02-16 2019-06-07 深圳市未来感知科技有限公司 基于深度学习的测距方法、装置、设备及存储介质
CN110147722A (zh) * 2019-04-11 2019-08-20 平安科技(深圳)有限公司 一种视频处理方法、视频处理装置及终端设备

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1570953A (zh) * 2003-07-22 2005-01-26 中国科学院自动化研究所 移动计算环境下的人脸检测方法
CN103413295B (zh) * 2013-07-12 2016-12-28 长沙理工大学 一种视频多目标长程跟踪方法
CN105989367B (zh) * 2015-02-04 2019-06-28 阿里巴巴集团控股有限公司 目标获取方法及设备
US10296793B2 (en) * 2016-04-06 2019-05-21 Nec Corporation Deep 3D attention long short-term memory for video-based action recognition
CN110337269B (zh) * 2016-07-25 2021-09-21 脸谱科技有限责任公司 基于神经肌肉信号推断用户意图的方法和装置
CN107451601A (zh) * 2017-07-04 2017-12-08 昆明理工大学 基于时空上下文全卷积网络的运动工件识别方法
CN108230358A (zh) * 2017-10-27 2018-06-29 北京市商汤科技开发有限公司 目标跟踪及神经网络训练方法、装置、存储介质、电子设备
CN111160229B (zh) * 2019-12-26 2024-04-02 北京工业大学 基于ssd网络的视频目标检测方法及装置
CN111414916B (zh) * 2020-02-29 2024-05-31 中国平安财产保险股份有限公司 图像中文本内容提取生成方法、装置及可读存储介质
CN111581436B (zh) * 2020-03-30 2024-03-22 西安天和防务技术股份有限公司 目标识别方法、装置、计算机设备和存储介质
CN111666857B (zh) * 2020-05-29 2023-07-04 平安科技(深圳)有限公司 基于环境语义理解的人体行为识别方法、装置及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2672424A1 (en) * 2012-06-08 2013-12-11 Realeyes OÜ Method and apparatus using adaptive face registration method with constrained local models and dynamic model switching
EP2672423A1 (en) * 2012-06-08 2013-12-11 Realeyes OÜ Method and apparatus for locating features of an object using deformable models
CN109859216A (zh) * 2019-02-16 2019-06-07 深圳市未来感知科技有限公司 基于深度学习的测距方法、装置、设备及存储介质
CN110147722A (zh) * 2019-04-11 2019-08-20 平安科技(深圳)有限公司 一种视频处理方法、视频处理装置及终端设备

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114399633A (zh) * 2022-01-19 2022-04-26 北京石油化工学院 一种基于YOLOv5s模型的移动电子设备位置检测方法
CN114764786A (zh) * 2022-03-14 2022-07-19 什维新智医疗科技(上海)有限公司 一种基于超声视频流的病灶区域实时检测装置
CN115690615A (zh) * 2022-10-11 2023-02-03 杭州视图智航科技有限公司 一种面向视频流的深度学习目标识别方法及系统
CN115690615B (zh) * 2022-10-11 2023-11-03 杭州视图智航科技有限公司 一种面向视频流的深度学习目标识别方法及系统
CN116363557A (zh) * 2023-03-17 2023-06-30 杭州再启信息科技有限公司 一种用于连续帧的自学习标注方法、系统及介质
CN116363557B (zh) * 2023-03-17 2023-09-19 杭州再启信息科技有限公司 一种用于连续帧的自学习标注方法、系统及介质

Also Published As

Publication number Publication date
CN112137591B (zh) 2021-07-23
CN112137591A (zh) 2020-12-29

Similar Documents

Publication Publication Date Title
WO2021189911A1 (zh) 基于视频流的目标物位置检测方法、装置、设备及介质
WO2021189912A1 (zh) 图像中目标物的检测方法、装置、电子设备及存储介质
WO2018108129A1 (zh) 用于识别物体类别的方法及装置、电子设备
WO2021208735A1 (zh) 行为检测方法、装置及计算机可读存储介质
TWI462035B (zh) 物件偵測後設資料
WO2022156066A1 (zh) 文字识别方法、装置、电子设备及存储介质
WO2022105179A1 (zh) 生物特征图像识别方法、装置、电子设备及可读存储介质
WO2021189855A1 (zh) 基于ct序列的图像识别方法、装置、电子设备及介质
WO2020253508A1 (zh) 异常细胞检测方法、装置及计算机可读存储介质
WO2021151338A1 (zh) 医学影像图片分析方法、装置、电子设备及可读存储介质
CN103503000A (zh) 面部识别
WO2021189913A1 (zh) 图像中目标物的分割方法、装置、电子设备及存储介质
WO2021151313A1 (zh) 证件鉴伪方法、装置、电子设备及存储介质
WO2022095359A1 (zh) 基于防摄屏的信息安全保护方法、装置、电子设备及介质
WO2022227218A1 (zh) 药名识别方法、装置、计算机设备和存储介质
CN112507934A (zh) 活体检测方法、装置、电子设备及存储介质
WO2022126914A1 (zh) 活体检测方法、装置、电子设备及存储介质
WO2021217852A1 (zh) 损伤检测方法、装置、电子设备及介质
WO2021189856A1 (zh) 证件校验方法、装置、电子设备及介质
CN112507923A (zh) 证件翻拍检测方法、装置、电子设备及介质
WO2021217853A1 (zh) 损伤图像智能定损方法、装置、电子设备及存储介质
WO2018137226A1 (zh) 指纹提取方法及装置
CN117197864A (zh) 基于深度学习的证件照分类识别及免冠检测方法及系统
WO2023134080A1 (zh) 相机作弊识别方法、装置、设备及存储介质
US10509986B2 (en) Image similarity determination apparatus and image similarity determination method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20926532

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20926532

Country of ref document: EP

Kind code of ref document: A1