CN115631339A

CN115631339A - Visual feature extraction method and device and electronic equipment

Info

Publication number: CN115631339A
Application number: CN202211071629.6A
Authority: CN
Inventors: 张鹏; 向国庆; 刘星宇
Original assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Current assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Priority date: 2022-09-02
Filing date: 2022-09-02
Publication date: 2023-01-20

Abstract

The invention discloses a visual feature extraction method and device and electronic equipment. Wherein, the method comprises the following steps: for each frame of image in a target image set, performing image preprocessing on an Nth frame of image in the target image set at a first starting moment; the target image set is an image frame sequence of visual features to be extracted, the image preprocessing comprises a down-sampling operation, and N is a positive integer greater than or equal to 1; performing visual feature extraction operation on the Nth frame of image at a second starting moment, and synchronously performing image preprocessing on the (N + 1) th frame of image until each frame of image finishes image preprocessing and visual feature extraction operation, wherein the time point of the first starting moment is before the second starting moment; and outputting the visual characteristics of each frame of image in the target image set. The invention solves the technical problem of long consumption in extracting visual features in the related technology.

Description

Visual feature extraction method and device and electronic equipment

Technical Field

The invention relates to the technical field of image processing, in particular to a visual feature extraction method and device and electronic equipment.

Background

In compact visual search, visual feature extraction is the most complex part. Generally, scale Invariant Feature Transform (SIFT) features are extracted from a Video Graphics Array (VGA) image, several seconds are often required on a common processor, power consumption of the processor can be increased, and as image resolution is larger and larger, the time for extracting visual features is longer, and the requirement of an intelligent terminal on low-delay response cannot be met.

Disclosure of Invention

The embodiment of the invention provides a visual feature extraction method, a visual feature extraction device and electronic equipment, and at least solves the technical problem that the extraction of visual features in the related technology is long in cost.

According to an aspect of an embodiment of the present invention, there is provided a visual feature extraction method including: for each frame of image in a target image set, performing image preprocessing on an Nth frame of image in the target image set at a first starting moment; the target image set is an image frame sequence of visual features to be extracted, the image preprocessing comprises a down-sampling operation, and N is a positive integer greater than or equal to 1; performing visual feature extraction operation on the Nth frame of image at a second starting time, and synchronously performing image preprocessing on the (N + 1) th frame of image until each frame of image completes image preprocessing and visual feature extraction operation, wherein the time point of the first starting time is before the second starting time; and outputting the visual characteristics of each frame of image in the target image set.

According to another aspect of the embodiments of the present invention, there is also provided a visual feature extraction device including: the first processing unit is used for carrying out image preprocessing on the Nth frame image in the target image set at a first starting moment for each frame image in the target image set; the target image set is an image frame sequence of visual features to be extracted, the image preprocessing comprises a down-sampling operation, and N is a positive integer greater than or equal to 1; the second processing unit is used for performing visual feature extraction operation on the Nth frame of image at a second starting time, and synchronously performing image preprocessing on the (N + 1) th frame of image until each frame of image finishes image preprocessing and visual feature extraction operation, wherein the time point of the first starting time is before the second starting time; and the output unit is used for outputting the visual characteristics of each frame of image in the target image set.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores therein a computer program, and the processor is configured to execute the above-mentioned visual feature extraction method through the computer program.

According to still another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned visual feature extraction method when running.

In the embodiment of the invention, each frame of image in a target image set is subjected to image preprocessing at a first starting moment; the target image set is an image frame sequence of visual features to be extracted, the image preprocessing comprises a down-sampling operation, and N is a positive integer greater than or equal to 1; performing visual feature extraction operation on the Nth frame of image at a second starting time, and synchronously performing image preprocessing on the (N + 1) th frame of image until each frame of image finishes image preprocessing and visual feature extraction operation, wherein the time point of the first starting time is before the second starting time; in the method, because a plurality of image processing operations in the visual feature extraction are simultaneously performed in a pipeline mode, the time for extracting the visual signs of the images can be shortened, the visual feature extraction efficiency is improved, and the technical problem that the visual feature extraction in the related technology consumes a long time is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of an application environment of an alternative visual feature extraction method according to an embodiment of the invention;

FIG. 2 is a schematic flow chart of an alternative visual feature extraction according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a feature extraction complexity analysis of an alternative visual feature extraction method according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating the hardware and software architecture of an alternative visual feature extraction method according to an embodiment of the present invention;

FIG. 5 is a frame-level pipeline diagram of an alternative visual feature extraction method according to an embodiment of the invention;

FIG. 6 is a block-level pipeline diagram of another alternative visual feature extraction method according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a feature detection framework of an alternative visual feature extraction method according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a feature description framework of yet another alternative visual feature extraction method according to an embodiment of the present invention;

FIG. 9 is a schematic structural diagram of an alternative visual feature extraction apparatus according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiments of the present invention, there is provided a visual feature extraction method, which may be applied, but not limited, to the application environment shown in fig. 1 as an optional implementation manner. As shown in fig. 1, a user 102 may interact with a user device 104. The user equipment 104 includes a memory 106 and a processor 108. In this embodiment, the user device 104 may, but is not limited to, perform the following operations to obtain the visual characteristics of each frame of image:

for each frame of image in a target image set, performing image preprocessing on an Nth frame of image in the target image set at a first starting moment; the target image set is an image frame sequence of visual features to be extracted, the image preprocessing comprises a down-sampling operation, and N is a positive integer greater than or equal to 1;

performing visual feature extraction operation on the Nth frame of image at a second starting time, and synchronously performing image preprocessing on the (N + 1) th frame of image until each frame of image completes image preprocessing and visual feature extraction operation, wherein the time point of the first starting time is before the second starting time;

and outputting the visual characteristics of each frame of image in the target image set.

Optionally, the user device 104 includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a PC, a vehicle-mounted electronic device, a wearable device, and other terminals. The above is merely an example, and this is not limited in this embodiment.

As shown in fig. 3, the complexity (the size of the calculation time length) of each algorithm module in the Compact visual Descriptor (CDVS) Compact visual feature extraction is shown in fig. 3. CDVS is a standard technique for image visual features, and as can be seen from fig. 3, the image preprocessing, feature point detection and feature point description occupy more than 90% of the overall complexity, and at the same time, the three modules use more logical operations, which are suitable for being accelerated by hardware. And the modules of local feature aggregation, coordinate compression and the like have low algorithm complexity and are suitable for being realized through software. Therefore, as shown in fig. 4, the steps implemented by hardware in the embodiment of the present invention are image preprocessing, feature point detection, and feature point description; the steps realized by software are image feature aggregation, image feature compression and coordinate compression in the image.

As an alternative implementation manner, as shown in fig. 2, an embodiment of the present invention provides a visual feature extraction method, including the following steps:

s202, for each frame of image in a target image set, carrying out image preprocessing on the Nth frame of image in the target image set at a first starting moment; the target image set is an image frame sequence of visual features to be extracted, the image preprocessing comprises a down-sampling operation, and N is a positive integer greater than or equal to 1;

s204, performing visual feature extraction operation on the Nth frame of image at a second starting moment, and synchronously performing image preprocessing on the (N + 1) th frame of image until each frame of image finishes image preprocessing and visual feature extraction operation, wherein the time point of the first starting moment is before the second starting moment;

s206, outputting the visual characteristics of each frame of image in the target image set.

In the embodiment of the present invention, as shown in fig. 5, a pipeline process of performing feature extraction and preprocessing on each frame image in a target image set in parallel is shown in fig. 5. Here, the image preprocessing includes, but is not limited to, down-sampling each frame of image, i.e. scaling the width and height of the image to 1/N of the original image; for example, the width and height of an image with a resolution of 1920 × 1080 are scaled to 1/4 of the original width and height, and a preprocessed image with a resolution of (1920/4) × (1080/4) is obtained. At the same time, the parallel operation of feature processing and preprocessing can be carried out on the images of different frames, and the processing time of visual feature extraction is shortened.

In the embodiment of the invention, image preprocessing is carried out on the Nth frame image in a target image set at a first starting moment for each frame image in the target image set; the target image set is an image frame sequence of visual features to be extracted, the image preprocessing comprises a down-sampling operation, and N is a positive integer greater than or equal to 1; performing visual feature extraction operation on the Nth frame of image at a second starting time, and synchronously performing image preprocessing on the (N + 1) th frame of image until each frame of image finishes image preprocessing and visual feature extraction operation, wherein the time point of the first starting time is before the second starting time; in the method, because a plurality of image processing operations for extracting the visual features are simultaneously performed in a pipeline mode, the time for extracting the visual signs of the images can be shortened, the visual feature extraction efficiency is improved, and the technical problem that the time for extracting the visual features in the related technology is long is solved.

In one or more embodiments, the visual feature extraction operation includes feature point detection and feature point description, and the method further includes:

sequentially executing the following operations on each frame of image in the target image set until each coding unit in each frame of image completes feature point detection and feature point description:

detecting the characteristic point of the Nth coding unit at a third starting moment;

performing feature point description on the Nth coding unit at a fourth starting moment, and synchronously performing feature point detection on the (N + 1) th coding unit; wherein N is a positive integer greater than or equal to 1, and the time point of the third starting time is before the fourth starting time.

In the embodiment of the present invention, as shown in fig. 6, a pipeline process of performing block (coding unit) level pipelining on each frame image in the target image set is shown in fig. 6. At the same time, when the parallel operation of feature processing and preprocessing is carried out on the images of different frames, the parallel operation of feature point detection and feature point description is carried out on the images of different frames, so that the processing time of visual feature extraction is shortened.

In one or more embodiments, the feature point detection includes key point detection and key point fine positioning, and the method further includes:

sequentially executing the following operations on each pixel point in the current coding unit until each pixel point in each coding unit completes the key point detection and key point fine positioning:

detecting key points of the Nth pixel point at a fifth starting moment;

carrying out key point fine positioning on the Nth pixel point at a sixth starting moment, and synchronously carrying out key point detection on the (N + 1) th pixel point; and N is a positive integer greater than or equal to 1, and the time point of the fifth starting time is before the sixth starting time.

In the embodiment of the invention, by the technical means, the images of different frames can be subjected to feature processing and preprocessing at the same time, and simultaneously, the images of different frames can be subjected to parallel operation of feature point detection and feature point description, and each pixel point in the current coding unit can be subjected to parallel operation of key point detection and key point fine positioning, so that the processing time of visual feature extraction is shortened.

In one or more embodiments, the visual feature extraction method further includes:

based on a preset hardware processing module, carrying out image preprocessing and visual feature extraction operation on each frame of image in the target image set;

performing feature soft processing on each frame of image in the target image set based on a software processing module associated with the hardware processing module, wherein the feature soft processing comprises feature aggregation, feature compression and coordinate compression.

As shown in fig. 3 and 4, image scaling (in the CDVS process, scaling, i.e., image preprocessing operation, is first required for a high-resolution image), key point detection, and key point description occupy more than 90% of the overall complexity, and the logical algorithm operations of the three modules are regular, so that the calculation speeds of the three processes can be increased through hardware calculation. The modules for local feature aggregation, coordinate compression and the like have low algorithm complexity and irregular operation, and the calculation speed of the processes of local feature aggregation and coordinate compression can be increased by calculating through a software module.

In one or more embodiments, the above visual feature extraction method further includes:

acquiring a target Gaussian image from an external memory, wherein the target Gaussian image is an image used by key pixel points with similar spatial positions in the process of describing the characteristic points;

and storing the target Gaussian image into an on-chip memory so that the key pixel points with similar spatial positions share the target Gaussian image.

In one or more embodiments, the visual features of each frame of image are scale invariant features transformed SIFT features.

Based on the above embodiments, in an application embodiment, the above visual feature extraction method combines the frame-level and block-level pipeline modes to realize hardware (processor) acceleration in the image extraction process, where the frame-level pipeline is shown in fig. 5, and when the 1 st frame is subjected to image preprocessing, the 0 th image can be subjected to image feature extraction at the same time.

The block-level pipeline is shown in fig. 6, and two processes including feature detection and feature description are realized through feature extraction. When the 0 th coding unit in the current video frame performs feature description, the 1 st coding unit can perform feature detection simultaneously. Generally, the more pipeline levels, the higher its data throughput and processing performance. According to the embodiment of the invention, the image is reasonably partitioned into blocks (divided into a plurality of coding units) according to the practical computing resource and storage resource constraints, so that the computing resource can be further saved.

As shown in fig. 7, fig. 7 is a block diagram of a microstructure implementation of feature point detection and fine positioning, which mainly includes 5 parts, namely, an input image group buffer, a control unit, a key point detection, a key point fine positioning, and a key point output. The key point detection is responsible for detecting key local feature points on the image of the visual signs to be extracted. The key point positioning is to further perform fine positioning around the preliminarily detected key points on the basis of key point detection to find out more accurate positions of the key points, and meanwhile, to filter out edge points according to the response characteristics of the key points. And the key point output is to integrate and output the remaining key point information after filtering. And the key point detection and the key point fine positioning are realized by adopting pixel-level running water.

The key point detection is implemented by using a low-order Polynomial in the CDVS standard, and completing the approximation of image filtering by using a Polynomial in an interest point detection method accompanied by Legendre Polynomial (ALP), thereby completing the detection of interest points. In addition, in order to reduce the memory required by the interest point detection, the CDVS further provides a frequency domain filtering method based on a block scale space, which divides the original scale space into a plurality of sub-blocks with overlapping, and then respectively performs the interest point detection. The interest point detection based on the block mode enables the feature extraction to be more beneficial to parallel acceleration and hardware implementation.

After the key point is detected, in order to adapt to different descriptor length constraints, the local feature set needs to be selected, and the local features are sorted according to importance to obtain a feature subset. The feature selection of the CDVS standard first calculates a relevance score for each local feature, and then ranks the local features according to the calculated relevance. This correlation characterizes the prior probability that the local feature points of a query image are matched by the correct database features. The relevancy score of the local features is determined by the scale of the key points, the response extreme value of the scale space, the distance between the positions of the key points and the center of the image and the like.

The feature point description micro-structure diagram is shown in fig. 8, and mainly includes a control logic, a data reading module, a gradient calculation, a local feature calculation, a feature summarization, and the like. And the control logic is responsible for the control of the whole module and coordinates the normal operation of each part of the assembly line. The data reading is responsible for reading the Gaussian images required by the key points with close spatial positions into the on-chip memory from the external memory, so that the Gaussian images shared by a plurality of adjacent key points are realized, and the bandwidth of the description module is greatly saved. The gradient calculation and the local description of the characteristics of other characteristics can be realized by adopting multiple paths in parallel so as to improve the hardware realization speed.

In the feature point description process, for a detected key point, a local feature needs to be described from a local area around the key point. The area around the key point should be centered on the key point position and rotated according to the principal direction of the key point so that the direction of the transverse axis of the area is aligned with the principal direction of the feature point. The local area of the keypoint has to be divided into 4 × 4, i.e. 16 subspaces, each as a unit. For each unit, each pixel is distributed to well-defined 8 directions according to the gradient direction of each pixel, and an 8-dimensional histogram is obtained through statistics. A local region gradient direction histogram is formed by sequentially splicing the gradient direction histograms of the units. This forms a local feature descriptor, represented as a 128-dimensional histogram vector.

Local feature compression, in the embodiment of the present invention, a low-complexity transform coding scheme is adopted for local feature descriptor compression by CDVS. The local features are transformed, quantized and variable length coded to realize coding. This scheme transform encodes 8 directional vectors within each SIFT unit histogram, respectively. And transforming the quantized descriptor elements, wherein only elements in certain dimensions are selected and coded into the code stream. At different code rates, the selected elements are determined according to a standard predefined table, so that the retrieval performance can be maximized. The number of elements selected is between 20 and 128, determined by the descriptor length constraint.

For local feature aggregation and coordinate compression implemented by a software program; in the process of compressing the local characteristic coordinates, a position histogram coding scheme is adopted for compressing the local position points. The position information of the local feature points forms a statistical histogram through a quantization statistic, and the histogram coding is divided into an identification mapping coding part and an identification statistical graph coding part. Wherein, the identification statistical chart represents the number of points contained in the blocks with characteristic points from top to bottom and from left to right, and the identification mapping represents the identification matrix whether each grid has local characteristics or not. Finally, the histogram is encoded using context-based arithmetic coding.

The CDVS uses a Scalable Compression Fisher Vector (SCFV) for global feature aggregation. To compress the high-dimensional Fisher Vector, several discriminative Gaussian components in the Gaussian mixture model are selected and retained. The SCFV has excellent matching performance, and the memory overhead is far smaller than that of a typical Fisher Vector compression method based on PCA or product quantization. The feature aggregation descriptor in the CDVS standard meets the requirements of low memory and low computational complexity of each stage in the feature extraction process while achieving high retrieval performance.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiments of the present invention, there is also provided a visual feature extraction apparatus for implementing the visual feature extraction method described above. As shown in fig. 9, the apparatus includes:

a first processing unit 902, configured to perform image preprocessing on an nth frame image in a target image set at a first starting time for each frame image in the target image set; the target image set is an image frame sequence of visual features to be extracted, the image preprocessing comprises a down-sampling operation, and N is a positive integer greater than or equal to 1;

a second processing unit 904, configured to perform a visual feature extraction operation on the nth frame of image at a second starting time, and perform image preprocessing on the (N + 1) th frame of image synchronously until each frame of image completes the image preprocessing and the visual feature extraction operation, where a time point of the first starting time is before the second starting time;

an output unit 906, configured to output a visual feature of each frame of image in the target image set.

In the embodiment of the invention, image preprocessing is carried out on the Nth frame image in a target image set at a first starting moment for each frame image in the target image set; the target image set is an image frame sequence of visual features to be extracted, the image preprocessing comprises a down-sampling operation, and N is a positive integer greater than or equal to 1; performing visual feature extraction operation on the Nth frame of image at a second starting time, and synchronously performing image preprocessing on the (N + 1) th frame of image until each frame of image finishes image preprocessing and visual feature extraction operation, wherein the time point of the first starting time is before the second starting time; in the method, because a plurality of image processing operations in the visual feature extraction are simultaneously carried out in a pipeline mode, the time for extracting the visual signs of the images can be shortened, the visual feature extraction efficiency is improved, and the technical problem that the time for extracting the visual features in the related technology is long is solved.

In one or more embodiments, the above visual feature extraction device further includes:

a third processing unit, configured to sequentially perform the following operations on each frame of image in the target image set until each coding unit in each frame of image completes feature point detection and feature point description:

performing feature point description on the Nth coding unit at a fourth starting moment, and synchronously performing feature point detection on the (N + 1) th coding unit; wherein N is a positive integer greater than or equal to 1, and the time point of the first start time is before the second start time.

In one or more embodiments, the feature point detection includes key point detection and key point fine positioning, and the visual feature extraction apparatus further includes:

the fourth processing unit is used for sequentially executing the following operations on each pixel point in the current coding unit until each pixel point in each coding unit completes the key point detection and key point fine positioning:

detecting key points of the Nth pixel point at a fifth starting moment;

In one or more embodiments, the visual feature extraction apparatus further includes:

the hardware processing unit is used for carrying out image preprocessing and visual feature extraction operation on each frame of image in the target image set based on a preset hardware processing module;

and the software processing unit is used for performing feature soft processing on each frame of image in the target image set based on a software processing module associated with the hardware processing module, wherein the feature soft processing comprises feature aggregation, feature compression and coordinate compression.

the device comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is used for acquiring a target Gaussian image from an external storage, and the target Gaussian image is an image used by key pixel points with similar spatial positions in the process of describing feature points;

and the storage unit is used for storing the target Gaussian image into an on-chip memory so as to enable the key pixel points with similar spatial positions to share the target Gaussian image.

According to still another aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the above-described visual feature extraction method, where the electronic device may be a terminal device or a server shown in fig. 10. The present embodiment takes the electronic device as an example for explanation. As shown in fig. 10, the electronic device comprises a processor 1002 and a memory 1004, a computer program is stored in the processor 1002, and the processor 1002 is configured to execute the steps in any of the above method embodiments by the computer program.

Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, for each frame of image in a target image set, carrying out image preprocessing on an Nth frame of image in the target image set at a first starting moment; the target image set is an image frame sequence of visual features to be extracted, the image preprocessing comprises a down-sampling operation, and N is a positive integer greater than or equal to 1;

s2, performing visual feature extraction operation on the Nth frame of image at a second starting moment, and synchronously performing image preprocessing on the (N + 1) th frame of image until each frame of image finishes image preprocessing and visual feature extraction operation, wherein the time point of the first starting moment is before the second starting moment;

and S3, outputting the visual characteristics of each frame of image in the target image set.

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 10 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 10 is a diagram illustrating a structure of the electronic device. For example, the electronics may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 10, or have a different configuration than shown in FIG. 10.

The memory 1004 can be used for storing software programs and modules, such as program instructions/modules corresponding to the visual feature extraction method and apparatus in the embodiments of the present invention, and the processor 1002 executes various functional applications and data processing, that is, implements the visual feature extraction method described above. The memory 1004 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the processor 1002 may further include memory located remotely from the memory 1004, which may be connected to the terminals over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1004 may be specifically, but not limited to, used for storing information such as video frames and image features. As an example, as shown in fig. 10, the processor 1002 may include, but is not limited to, the first processing unit 902, the second processing unit 904, and the output unit 906 of the visual feature extraction apparatus. In addition, other module units in the above-mentioned visual feature extraction apparatus may also be included, but are not limited to this, and are not described in detail in this example.

Optionally, the above-mentioned transmission device 1006 is used for receiving or sending data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 1006 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices so as to communicate with the internet or a local area Network. In one example, the transmission device 1006 is a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

In addition, the electronic device further includes: a display 1008 for displaying the visual features; and a connection bus 1010 for connecting the respective module parts in the above-described electronic apparatus.

In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. The nodes may form a Peer-To-Peer (P2P) network, and any type of computing device, such as a server, a terminal, and other electronic devices, may become a node in the blockchain system by joining the Peer-To-Peer network.

According to an aspect of the application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. A processor of the computer device reads the computer instructions from the computer-readable storage medium, the processor executing the computer instructions causing the computer device to perform the visual feature extraction method described above, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when executed.

Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the various methods in the foregoing embodiments may be implemented by a program instructing hardware related to the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, read-Only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be substantially or partially implemented in the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, or network devices) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is only a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims

1. A visual feature extraction method, comprising:

performing visual feature extraction operation on the Nth frame of image at a second starting time, and synchronously performing image preprocessing on the (N + 1) th frame of image until each frame of image finishes image preprocessing and visual feature extraction operation, wherein the time point of the first starting time is before the second starting time;

2. The method of claim 1, wherein the visual feature extraction operation comprises feature point detection and feature point description, the method further comprising:

3. The method of claim 2, wherein the feature point detection comprises keypoint detection and keypoint fine localization, the method further comprising:

sequentially executing the following operations on each pixel point in the current coding unit until each pixel point in each coding unit completes key point detection and key point fine positioning:

detecting key points of the Nth pixel point at a fifth starting moment;

carrying out key point fine positioning on the Nth pixel point at a sixth starting moment, and synchronously carrying out key point detection on the (N + 1) th pixel point; the N is a positive integer greater than or equal to 1, and the time point of the fifth starting time is before the sixth starting time.

4. The method of claim 3, further comprising:

acquiring a target Gaussian image from an external memory, wherein the target Gaussian image is an image used by key pixel points with similar spatial positions in the process of describing feature points;

5. The method of claim 1, wherein the visual feature of each frame of image is Scale Invariant Feature Transform (SIFT) feature.

6. The method of claim 1, further comprising:

7. A visual feature extraction device characterized by comprising:

the first processing unit is used for carrying out image preprocessing on the Nth frame image in the target image set at a first starting moment for each frame image in the target image set; the target image set is an image frame sequence of visual features to be extracted, the image preprocessing comprises a down-sampling operation, and N is a positive integer greater than or equal to 1;

the second processing unit is used for performing visual feature extraction operation on the Nth frame of image at a second starting time, and synchronously performing image preprocessing on the (N + 1) th frame of image until each frame of image finishes image preprocessing and visual feature extraction operation, wherein the time point of the first starting time is before the second starting time;

and the output unit is used for outputting the visual characteristics of each frame of image in the target image set.

8. The apparatus of claim 7, wherein the visual feature extraction operation comprises feature point detection and feature point description, the apparatus further comprising:

a third processing unit, configured to perform the following operations on each frame image in the target image set in sequence until each coding unit in each frame image completes feature point detection and feature point description:

9. The apparatus of claim 8, wherein the feature point detection comprises keypoint detection and keypoint fine localization, the apparatus further comprising:

detecting key points of the Nth pixel point at a fifth starting moment;

10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 6 by means of the computer program.