WO2023082641A1 - 电子档案生成方法、装置、终端设备及存储介质 - Google Patents

电子档案生成方法、装置、终端设备及存储介质 Download PDF

Info

Publication number
WO2023082641A1
WO2023082641A1 PCT/CN2022/099852 CN2022099852W WO2023082641A1 WO 2023082641 A1 WO2023082641 A1 WO 2023082641A1 CN 2022099852 W CN2022099852 W CN 2022099852W WO 2023082641 A1 WO2023082641 A1 WO 2023082641A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature information
target object
archived
target
target objects
Prior art date
Application number
PCT/CN2022/099852
Other languages
English (en)
French (fr)
Inventor
余晓填
王孝宇
Original Assignee
深圳云天励飞技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术股份有限公司 filed Critical 深圳云天励飞技术股份有限公司
Publication of WO2023082641A1 publication Critical patent/WO2023082641A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application belongs to the technical field of data processing, and in particular relates to an electronic file generation method, device, terminal equipment and storage medium.
  • Visual data generally includes video data and picture data.
  • the automatic generation of visual data electronic archives is the process of data governance and analysis of user-defined target objects, and the process of clustering data according to certain rules. For example, given a certain number of terminal camera snapshots, and the target object defined by the user is a person, then the automatic generation of the visual data electronic file is to gather the snapshots of the same person and separate the snapshots of different people;
  • the snapshot picture file is called one file per person in the engineering business.
  • the target object of the archive automatically generated by the visual data electronic archive can be a collection of people, objects or even an abstract concept.
  • the engineering business there is one file for one person, one file for one car, and one file for one relationship.
  • user-defined target objects are abstract concepts. The specific relationship can be friend relationship, family relationship, etc.
  • One relationship and one file of visual data can form the modeling, analysis and mining of human social relationships.
  • the current automatic generation of visual data electronic archives generally clusters visual data based on single-dimensional feature information, especially single-dimensional visual feature information.
  • single-dimensional feature information especially single-dimensional visual feature information.
  • traditional models and algorithms are based on the visual features of the target object in pictures or videos (similar to the key point features of the target body).
  • the clustering accuracy of the traditional automatic generation method of visual data electronic archives is not high, and it often fails to meet the business requirements in practical engineering applications.
  • the embodiment of the present application provides an electronic file generation method, device, terminal equipment and storage medium, which can solve the problem that the traditional automatic generation method of visual data electronic files has low clustering accuracy and often fails to meet business requirements in actual engineering applications. technical problem.
  • the embodiment of the present application provides a method for generating an electronic file, including:
  • auxiliary feature information corresponding to each of the target objects from a plurality of images to be archived, where the auxiliary feature information represents time or space information associated with the corresponding target object;
  • all the target objects are clustered to obtain multiple electronic files.
  • an electronic file generation device including:
  • an image acquisition unit configured to acquire a plurality of images to be archived
  • a first feature information acquisition unit configured to identify a target object for each of the images to be archived, and obtain visual feature information of all target objects corresponding to a plurality of images to be archived;
  • a second feature information acquiring unit configured to acquire auxiliary feature information corresponding to each target object from a plurality of images to be archived, the auxiliary feature information representing the time or space associated with the corresponding target object information;
  • the electronic archive forming unit is configured to cluster all the target objects according to the visual feature information and the auxiliary feature information corresponding to each of the target objects to obtain multiple electronic archives.
  • an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • a terminal device including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the computer program, The steps of the method described in the first aspect above are realized.
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of the method described in the above-mentioned first aspect are implemented.
  • an embodiment of the present application provides a computer program product, which, when the computer program product is run on a terminal device, causes the terminal device to execute the steps of the method described in the first aspect above.
  • target objects are identified for multiple images to be archived, and all target objects are clustered according to the visual feature information and auxiliary feature information of all target objects corresponding to multiple images to be archived to obtain multiple
  • An electronic file avoids the problem of low clustering accuracy caused by relying on the visual feature information of a single target object for clustering.
  • the auxiliary feature information represents the time or space information associated with the target object. Therefore, introducing the time information or spatial information of the target object into the archiving process through auxiliary features can effectively supplement the visual feature information of the target object. Especially for visual data with insufficient visual feature information of the target object or visual data with low data quality, the accuracy of its archiving can be effectively improved, so as to meet the business requirements in actual engineering applications.
  • FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application
  • Fig. 2 is a schematic flow chart of an electronic file generation method provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an electronic file generation device provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the term “if” may be construed, depending on the context, as “when” or “once” or “in response to determining” or “in response to detecting “.
  • the phrase “if determined” or “if [the described condition or event] is detected” may be construed, depending on the context, to mean “once determined” or “in response to the determination” or “once detected [the described condition or event] ]” or “in response to detection of [described condition or event]”.
  • references to "one embodiment” or “some embodiments” or the like in the specification of the present application means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically stated otherwise.
  • the terms “including”, “comprising”, “having” and variations thereof mean “including but not limited to”, unless specifically stated otherwise.
  • the electronic archives in the embodiment of the present application specifically refers to clustering visual data with the same target object attributes into one file, and visual data with different target object attributes are divided into different files.
  • the target object property of the archive is user-defined. For example, when the attribute of the target object is defined as a person, then the electronic file generated in the embodiment of the present application is a visual data collection in units of people.
  • the attribute of the target object can be set as required, for example, the attribute of the target object can be a person or a car.
  • the method for generating electronic archives can be applied to the scene shown in Figure 1, which includes a plurality of acquisition devices 11 and electronic archive generation devices 12, the plurality of acquisition devices are arranged in different positions, and the plurality of The acquisition device 11 sends the multiple images captured to the electronic file generation device 12, and the electronic file generation device 12 generates an electronic file, establishes a file for each individual target object involved in all images, and classifies the corresponding images into corresponding electronic files.
  • a plurality of acquisition devices 11 can be installed in multiple different positions in a certain fixed area, for example, they may be installed in different positions of a certain shopping mall, such as the elevator entrance of the shopping mall, the entrance of the shopping mall, the escalator entrance of the shopping mall, the exit of the shopping mall, etc., electronic files
  • the generation device 12 archives the people who appear in different collection devices, and generates one file for each person in the mall.
  • the electronic file generation device in the embodiment of the present application may be a server, a mobile phone, a tablet computer, a notebook computer, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA) and other terminals equipment.
  • the embodiment of the present application does not impose any limitation on the specific type of the terminal device.
  • the acquisition device 11 has a certain data processing function, and can also pre-process the image before sending it to the electronic file generation device 12 while taking the image, wherein the pre-processing can be the captured image Perform structured processing, and extract the visual feature information of the target object; transmit the structured processed image and the corresponding visual feature information of the target object to the electronic archive generation device 12, so as to further complete subsequent archiving operations.
  • the data processing function of the collection device can reduce the data processing pressure of the electronic archive generation device 12 .
  • the collection device in the embodiment of the present application can be a camera with data processing function, and the collection device can also be a mobile phone, a tablet computer, a notebook computer and other terminal devices with a camera function, and the embodiment of the present application does not make any specific types of collection devices limit.
  • the collection device may also be a common camera.
  • Fig. 2 shows a schematic flowchart of the method for generating an electronic file provided by the present application. Referring to Fig. 2, the method includes:
  • Step S201 acquiring multiple images to be archived.
  • the image to be archived in the embodiment of the present application comes from visual data, and the visual data can be picture data and/or video data, wherein the image to be archived can be an image directly obtained from a collection device, or a video obtained by a collection device The resulting image frame of data.
  • the multiple images to be archived may be obtained directly from the acquisition device, or may be obtained and stored in advance from the acquisition device.
  • the image to be archived needs to be structured before feature extraction, and the structured processing of the picture refers to extracting the pixel value of each pixel of the picture to obtain the pixel value of each pixel.
  • a pooling operation can also be performed on structured images, so that all images have a uniform size, which facilitates subsequent feature information extraction and feature fusion steps.
  • each image to be archived may contain one or more target objects.
  • Step S202 performing target object recognition on each of the images to be archived, and obtaining visual feature information of all target objects corresponding to the multiple images to be archived.
  • the visual feature information is a visual feature of the target object itself, and specifically refers to a feature that can distinguish different individual target objects from a visual perspective. Different individual target objects have different visual feature information. For example, if the target object of archiving is a person, that is, the image to be archived is to be formed into an electronic archive according to one file per person. At this time, the visual feature information can be facial features or human body features.
  • all target objects corresponding to multiple images to be archived means that all target objects are included in multiple images to be archived. There is 1 target object, there are 3 target objects in the second image to be archived, 3 target objects in the third image to be archived, and 2 target objects in the fourth image to be archived; then the four images to be archived correspond to The sum of the numbers of all target objects in , that is, the number of all target objects corresponding to the four images to be archived is 9.
  • the recognition of the target object is the process of extracting the visual feature information of the target object.
  • the visual feature information can be obtained by using a pre-trained deep learning network model to extract features from the image to be archived.
  • the deep learning network model in the embodiment of this application It can be convolutional neural network CNN, residual network ResNet, graph convolutional neural network GCN, Transformer, etc., and is not specifically limited here.
  • Using a deep learning network model to extract visual feature information belongs to a conventional technical means in the field, and details will not be described here.
  • Step S203 acquiring auxiliary feature information corresponding to each of the target objects from a plurality of images to be archived, where the auxiliary feature information represents time or space information associated with the corresponding target object.
  • the auxiliary feature information in the embodiment is time-related information or space-related information associated with the corresponding target object.
  • the auxiliary feature information can supplement the visual feature information of the target object from the perspective of time or space, so as to fully Using the relevant information in the images to be archived lays the foundation for the subsequent realization of accurate clustering of the images to be archived.
  • auxiliary feature information may include background feature information.
  • the background feature information specifically refers to the background information after the visual feature information of the target object is removed from the image to be archived corresponding to the target object.
  • the acquisition method of the background characteristic information may be: performing differential processing on the image to be archived corresponding to each target object and the visual characteristic information of the target object to obtain the background characteristic information of each target object.
  • the background feature information can form a good supplement to the visual feature information of the target object. Archive the image to cluster the target objects in the image to be archived.
  • an image to be archived includes 3 target objects, numbered respectively as a, b and c, wherein the background feature information of the target object a refers to the image information after the target object a is removed from the image to be archived, that is, in Both the target object b and the target object c in the image to be archived are the background feature information of the target object a.
  • the background feature information of target object b and target object c can be obtained
  • the auxiliary feature information may also include acquisition device attribute information.
  • the attribute information of the acquisition device refers to the information possessed by the target object based on certain attributes of the acquisition device.
  • the attribute information of the acquisition device especially refers to the information related to the acquisition device that will affect the visual feature information of the target object.
  • Collecting device attribute information can provide additional information support for the formation of high-accuracy electronic archives of visual data.
  • the attribute information of the collection device in this embodiment may include target object density information, target object collection angle information and/or clarity information.
  • the target object density information, target object acquisition angle information and clarity information all have a greater impact on the quality of the extracted target object visual feature information, and the archive can be adjusted according to the target object density information, target object acquisition angle information and clarity information.
  • the specific requirements for visual feature information in the process for example, when comparing the similarity of two target objects belonging to different images to be archived according to the visual feature information, for different acquisition equipment attribute information (target object density information, target object acquisition angle information and/or clarity information), different similarity thresholds can be set.
  • the target object density information refers to the number of target objects in the image to be archived corresponding to the target object.
  • the target object density information is related to the installation location of the acquisition equipment, so in the embodiment of the present application, the target object density information is classified as the acquisition equipment attribute information, and it can be known that the target object density information is associated with the spatial information of the image to be archived .
  • the target object density refers to the density of people in the image to be archived corresponding to a certain target object, that is, the number of heads;
  • the crowd density will have a great impact on the extraction quality of the visual feature information of the target object:
  • the quality of the visual feature information of the target object extracted from the image to be archived with a high crowd density is low, so in the follow-up When clustering it, it is necessary to reduce the requirements for the image to be archived in order to achieve accurate clustering.
  • clustering is performed by comparing the similarity of the visual feature information of the target object, for the image to be archived with a small crowd density, it is judged to be the same person when the similarity is over 80%, and for the image to be archived with a large crowd density For archived images, a similarity of more than 60% can be judged to be the same person.
  • the target object density information may be the target object density information in the current image to be archived corresponding to the target object, or may be the average target object density information within a preset historical time period.
  • the average target object density information refers to the average value of the target object density information corresponding to the historical images acquired by the acquisition device that acquires the images to be archived within a preset historical time period.
  • the method for obtaining the target object density information includes: counting the target objects in the image to be archived corresponding to each target object, and obtaining the target object The number of objects, the number of target objects is used as the target object density information corresponding to each target object.
  • the method for obtaining the target object density information includes: obtaining multiple historical images, and the multiple historical images have the same acquisition conditions as the current image to be archived (acquisition The same condition refers to having the same acquisition parameters at the same installation location, such as belonging to an acquisition device); counting the target object for each historical image, and calculating the average value of the counting results to obtain the average target object density information, the The average target object density information is used as the target object density information of the corresponding target object.
  • the acquisition angle information of the target object refers to the yaw angle of the target object
  • the optional yaw angle may include three angle information of roll angle, pitch angle, and yaw angle.
  • the acquisition angle of the target object is related to the installation position and direction of the equipment.
  • the acquisition angle information of the target object refers to the angle of the collected face.
  • the collection device is fixed by default. Then the collection angle of the target object is related to the activity pattern of the person and the collection device. related to the installation situation. Therefore, in the embodiment of the present application, the acquisition angle information of the target object is classified as device attribute information, and it can be known that the acquisition angle information of the target object is associated with the spatial information of the image to be archived corresponding to the target image.
  • the acquisition angle information of the target object may be the acquisition angle information of the target object in the current image to be archived corresponding to the target object, or may be a historical average value within a preset historical time period.
  • the method for acquiring the acquisition angle of the target object may include: inputting the current image to be archived into a pre-trained acquisition angle acquisition model for processing , to obtain the acquisition angle information of the target object.
  • the method for obtaining the acquisition angle of the target object may include: acquiring multiple historical images, and the multiple historical images are identical to the corresponding acquisition conditions of the current image to be archived (the same acquisition conditions refer to the acquisition conditions at the same time)
  • An installation location has the same acquisition parameters, such as belonging to an acquisition device); multiple historical images are input into the pre-trained acquisition angle acquisition model for processing, and multiple historical acquisition angle information is obtained; multiple historical acquisition angle information is acquired
  • the average value is used to obtain the collection angle information of the target object.
  • the sharpness information may be the sharpness information of the current image to be archived corresponding to the target object, or may be a historical average value within a preset historical time period.
  • the method for obtaining the sharpness information may include: inputting the current image to be archived into a pre-trained sharpness model for processing to obtain the sharpness information .
  • the method for obtaining the sharpness information may include: acquiring multiple historical images, and the multiple historical images correspond to the same acquisition conditions as the current image to be archived (the same acquisition conditions refer to the same installation location have the same acquisition parameters, such as belonging to one acquisition device); multiple historical images are input into the pre-trained sharpness model for processing to obtain multiple historical sharpness information; multiple historical sharpness information are averaged to obtain the clarity information.
  • Both the acquisition angle acquisition model and the sharpness model in the embodiment of the present application can be deep learning network models, specifically, deep learning network models can be convolutional neural network CNN, residual network ResNet, graph convolutional neural network GCN, Transformer wait.
  • the deep learning network models for different purposes are obtained by using different data sets and using conventional training methods for training.
  • Using the deep learning network model to extract the acquisition angle information and definition information of the target object belongs to the conventional technical means in this field, and will not be repeated here.
  • this contingency will generate relatively large noise when it is used for clustering.
  • the use of historical averages can reduce the noise caused by chance, making the obtained target object acquisition angle information and definition information more representative, and reducing the impact of noise on clustering accuracy.
  • the acquisition angle information and clarity information of the target object have a great influence on the extraction of the visual feature information of the target object.
  • the application scenario where the target object is a person will be described in detail below.
  • the acquisition angle of the target object can be a person's front face or a person's Side face, when the image to be archived is the frontal face of a person, the extracted visual feature information (facial features) has better quality than the side face of the person, so the subsequent clustering effect is better. good. In terms of sharpness information, the higher the sharpness of the face, the better the subsequent clustering effect.
  • the auxiliary feature information may also include spatiotemporal sequence feature information, which characterizes the activity law of the corresponding target object in the target scene.
  • the time-space sequence feature information is associated with the target object, and the time-space sequence feature information is obtained based on the summary of the activity rules of the target object in all images to be archived.
  • the specific role of spatio-temporal sequence feature information is to complement the missing visual data due to insufficient visual feature information of the target object, so as to complete the clustering of these visual data. For example, when the target object is a person, because the image to be archived cannot be correctly clustered due to the angle of the face or human body feature information, then the time-space sequence feature information associated with the target object can be used to solve this problem.
  • Missing visual data for example, from the time-space sequence feature information, it can be known that the target object A often appears in the collection device 1 at 8:00 am, then if the target object in an image to be archived by the collection device 1 at 8:00 am is encountered.
  • the visual features are not clear, it can be judged that the probability of the target object being A is relatively high, so as to realize the function of complementing the visual data.
  • obtaining the auxiliary feature information corresponding to each of the target objects from multiple images to be archived may include:
  • the spatiotemporal feature information of the images to be archived corresponding to the target objects in each target object set is obtained, and then each target object set is calculated
  • the time-space sequence feature information corresponding to each target object set is used as the time-space sequence feature information of each target object in the target object set.
  • Y target object sets are obtained through pre-clustering, wherein all target objects included in each target object set are the same individual target object according to the pre-clustering standard. If an image to be archived includes multiple target objects, different target objects belong to different target object sets. For example, in an application scenario where the target object is a person, if Y target object sets are obtained through pre-clustering, it can be considered as a preliminary judgment that all images to be archived include Y different people after pre-clustering.
  • the calculation obtains the time-space sequence feature information corresponding to each target object set, specifically including:
  • the time-space sequence feature information corresponding to all target objects is the same.
  • the target object individuals corresponding to the target object set appear in different collection devices according to time.
  • the acquiring the auxiliary feature information corresponding to each of the target objects from the plurality of images to be archived further includes:
  • the visual feature information corresponding to the first target object and the visual feature information corresponding to each target object belonging to Y target object sets Perform similarity comparison to obtain the top K target objects with the highest similarity;
  • the set of candidate target objects is Y image sets including the first K The set of target objects with the largest number of target objects;
  • the time-space sequence feature information of each updated target object set is used as the time-space sequence feature information of all target objects in the updated target object set.
  • the method for calculating the time-space sequence feature information corresponding to each updated target object set is basically the same as the method for calculating the time-space sequence feature information corresponding to each target object set, which is not described here Do repeat.
  • the method for obtaining the feature information of the time-space sequence will be described below in an application scenario where the target object is a person.
  • the 1000 archived images come from 10 acquisition devices, and the target object is the person in the image.
  • the Each image to be archived carries out target object recognition (such as face recognition), and the visual feature information (such as human face features) of all target objects in the 1000 images to be archived has been obtained; each target has been obtained in step S203
  • target object recognition such as face recognition
  • visual feature information such as human face features
  • the first case is: assuming that 1000 images to be archived get a total of 1500 target objects, according to the visual feature information of the 1500 target objects corresponding to the 1000 images to be archived, all target objects are pre-clustered, assuming that 20 targets are obtained Object collection, if all target objects can be clustered into 20 target object collections, 20 image collections correspond to 20 target object individuals.
  • the 1000 images to be archived are roughly about 20 different target individuals.
  • target object A corresponds to target object set B1
  • target object set B1 contains 60 target objects, and each target object corresponds to an image to be archived; for the 60 target objects in target object set B1, according to the time corresponding to the image to be archived Sorting, and then extracting the collection time and collection device number of the images to be archived corresponding to the 60 sorted target objects according to the sorting, and obtaining the collection time sequence and collection device number sequence corresponding to the target object set B1; in the target object set B1
  • the time-space sequence feature information of each target object is the obtained collection time series and collection equipment number series. Perform the above operation once for each target object set to obtain the time-space sequence feature information corresponding to all target objects.
  • the collection time can be divided according to 24 hours a day, that is, the collection time may be one of 24 values, and there are 10 collection device numbers in this embodiment.
  • the second case is: suppose 1000 images to be archived get a total of 1500 target objects, and pre-cluster all target objects according to the visual feature information of 1500 target objects corresponding to 1000 images to be archived, assuming 20 objects are obtained Object collection, 20 target object collections correspond to 20 target object individuals. If there is a target object X that cannot be clustered into a set of 20 target objects. In this case, the target object X needs to be processed separately.
  • the specific processing method includes: comparing the visual feature information corresponding to the target object X with the visual feature information corresponding to 1499 target objects, and selecting the top similarity 100 target objects, if the 100 target objects included in the 20 target object sets are the target object set B2 at most, then the target object set B2 is a candidate target object set, and the target object X is added to the candidate target object set B2; Thus, the 20 target object sets are updated, and 20 updated target object sets are obtained. For the 20 updated target object sets, use the same operation method as for the target object set B1 in the first case to obtain the time-space sequence feature information corresponding to all target objects.
  • the specific method for obtaining the feature information of the time-space sequence can process the target object X that cannot be pre-clustered.
  • the target object X may be arbitrarily clustered into a set of target objects or discard target object X. Therefore, based on the method in the embodiment of the present application, the target object X can be scientifically pre-clustered to avoid random classification or abandonment of the target object X. Therefore, compared with the conventional clustering method, the method in the present application can be obtained later. Accurate electronic files.
  • Step S204 according to the visual feature information and the auxiliary feature information corresponding to each target object, cluster all the target objects to obtain multiple electronic files.
  • each electronic archive corresponds to a specific target individual, that is, the electronic archives are filed according to the target object, each electronic archive includes at least one image to be archived, and all images to be archived in each electronic archive are Including the same specific target individual, it can be applied to one file per person or one file per car in engineering business.
  • all target objects are clustered according to the visual feature information and auxiliary feature information of the target object, and the time information or spatial information related to the target object is introduced into the archiving process by introducing auxiliary feature information, so that the target
  • the visual feature information of the object forms an effective supplement, especially for visual data with insufficient visual feature information of the target object or visual data with low data quality, which can effectively improve the accuracy of its archiving and meet the business requirements of actual engineering applications.
  • step S204 may be: input the visual feature information and the auxiliary feature information corresponding to all the target objects into a pre-trained archiving model for processing, and obtain the The similarity results; according to the similarity results, all target objects are clustered to obtain a plurality of electronic files corresponding to individual target objects.
  • the source of the training samples of the archive model is the same as or similar to the source of the image to be archived.
  • the same source means that the acquisition devices corresponding to the training samples and the acquisition devices corresponding to the images to be archived are in the same batch; similar sources mean that the acquisition devices corresponding to the training samples and the acquisition devices corresponding to the images to be archived have similar settings and topological structures. resemblance.
  • the expression of the archive model is shown in formula (1):
  • D is the archive matrix, the dimension is M ⁇ M, and M is the number of all target objects corresponding to the image to be archived;
  • a 1 represents the visual feature information collection of all target objects,
  • a 2 represents the background feature information collection of all target objects, and
  • a 3 represents the time-space sequence feature information collection of all target objects, and
  • a 4 represents the equipment attribute information collection of all target objects.
  • the functions f 1 , f 2 , f 3 , and f 4 respectively perform matrix transformation on the corresponding sets to obtain component matrices corresponding to the sets that characterize the similarity between any pair of target objects in all target objects, and perform four component matrices The summing calculation results in the final profile matrix.
  • the dimension of D is M ⁇ M, and M is the number of all target objects corresponding to the image to be archived;
  • the dimension of W 1 is M ⁇ P 1 , and the dimension of A 1 is P 1 ⁇ M;
  • W The dimension of 2 is M ⁇ P 2 , the dimension of A 2 is P 2 ⁇ M;
  • the dimension of W 3 is M ⁇ P 3 , the dimension of A 3 is P 3 ⁇ M;
  • the dimension of W 4 is M ⁇ P 4 , A The dimension of 4 is P 4 ⁇ M.
  • P 1 , P 2 , P 3 and P 4 are dimensions of data corresponding to A 1 , A 2 , A 3 and A 4 respectively.
  • the archive matrix obtained through the model is a square matrix, and the dimension M of the archive matrix is equal to the number of all target objects corresponding to the images to be archived. Generally speaking, the number of all target objects is greater than or equal to the number of images to be archived.
  • the first column of A1 represents the visual feature information corresponding to the first target object
  • the second row of represents the visual feature information corresponding to the second target object
  • the first column of A 1 is multiplied by
  • the result after the second line is the similarity result related to the visual feature information between the first target object and the second target object, and so on to obtain the similarity related to the visual feature information between any two target objects in all target objects Degree results
  • the matrix W 1 A 1 is obtained.
  • the calculation methods of matrices W 2 A 2 , W 3 A 3 and W 4 A 4 are similar.
  • the archive matrix D is obtained by matrix summation.
  • W 1 -W 4 are archive model parameters of linear transformation.
  • the parameters of the archive model can be obtained by training the deep learning network.
  • the deep learning network can be convolutional neural network CNN, residual network ResNet, graph convolutional neural network GCN, Transformer, etc.
  • the training sample data is labeled data
  • the training of the deep learning network is directly carried out in a supervised manner
  • the training samples are several groups of sample images, and each group of sample images is a training sample.
  • the electronic files of each group of sample images about the target object are known, that is, the similarity results of two sample images in all sample images in each group of sample images are known; first, the visual features of each sample image in each group of sample images
  • Information and auxiliary feature information for specific methods, refer to the method for extracting visual feature information and auxiliary feature information of the target object in the electronic file generation method in the embodiment of the present application
  • several groups are divided into training sets and test sets, and the training set and the test set respectively include multiple groups of sample images
  • the training methods of the deep learning network include:
  • the accuracy of the model is evaluated on the training set and the test set, and if the evaluation results meet the preset conditions, the training of the deep learning network is completed.
  • high-quality partial electronic archives can be obtained first by means of feature search similarity, and then the high-quality partial electronic archives can be used as training samples for supervised training of the deep learning network , the specific method is the same as above, and will not be repeated here.
  • each calculated value is not necessarily 0 or 1, but a value close to 1 or 0.
  • the values in the file matrix can be binary classified, for example 0.5 can be taken as the threshold value, when the value in the file matrix D is greater than 0.5, it is taken as 1, and when it is less than or equal to 0.5, it is taken as 0. It can be understood that the value in the file matrix is 0 or 1, and 1 represents that the picture with row ID and the picture with column ID belong to an electronic file.
  • the archives matrix D 0 indicates that the number of all target objects corresponding to the images to be archived that participate in the automatic generation of electronic archives is four, and the generated results are the first target object and the second
  • the target object belongs to the same target object individual, and the third target object and the fourth target object belong to the same target object individual.
  • the image to be archived corresponding to the first target object and the image to be archived corresponding to the second target object belong to one electronic file
  • the image to be archived corresponding to the third target object and the image to be archived corresponding to the fourth target object belong to another electronic file .
  • the visual feature information and auxiliary feature information of all target objects corresponding to a plurality of images to be archived all target objects are clustered to obtain electronic archives, which avoids the problem of relying on the visual feature information of a single target object for clustering.
  • the problem of low clustering accuracy the auxiliary feature information represents the time or space information associated with the corresponding target object. Therefore, introducing the time information or spatial information of the target object into the archiving process through auxiliary features can effectively supplement the visual feature information of the target object. .
  • the accuracy of its archiving can be effectively improved, so as to meet the business requirements in actual engineering applications.
  • FIG. 3 shows a structural block diagram of an apparatus for generating electronic archives provided by the embodiments of the present application. For ease of description, only the parts related to the embodiments of the present application are shown.
  • the electronic file generating device 3 includes:
  • An image acquisition unit 31 configured to acquire a plurality of images to be archived
  • the first feature information acquisition unit 32 is configured to identify the target object for each of the images to be archived, and obtain visual feature information of all target objects corresponding to the multiple images to be archived;
  • the second feature information acquiring unit 33 is configured to acquire auxiliary feature information corresponding to each target object from a plurality of images to be archived, the auxiliary feature information representing the time or time associated with the corresponding target object spatial information;
  • the electronic file forming unit 34 is configured to cluster all target objects according to the visual feature information and the auxiliary feature information corresponding to each target object to obtain multiple electronic files.
  • auxiliary feature information includes background feature information
  • the second feature information acquiring unit 33 when the second feature information acquiring unit 33 is used to acquire the auxiliary feature information corresponding to each of the target objects from a plurality of images to be archived, It is specifically used for: performing differential processing on the image to be archived corresponding to each target object and the visual feature information of the target object to obtain the background feature information corresponding to each target object.
  • the auxiliary feature information also includes acquisition device attribute information, and when the second feature information acquiring unit 33 acquires the auxiliary feature information corresponding to each of the target objects from a plurality of images to be archived, it also includes It is specifically used to obtain the attribute information of the acquisition device corresponding to each target object.
  • the collection device attribute information may include target object density information, target object collection angle information, and/or clarity information.
  • the second feature information acquiring unit 33 when used to acquire the attribute information of the acquisition device corresponding to each target object, it is specifically configured to: count the target objects corresponding to the images to be archived for each target object, and obtain the target object The number of target objects is used as the target object density information corresponding to each target object.
  • the second feature information acquiring unit 33 when used to acquire the attribute information of the acquisition device corresponding to each target object, it is also specifically configured to: acquire a plurality of historical images, and the plurality of historical images correspond to the acquisition conditions of the current image to be archived The same; multiple historical images are input into the pre-trained acquisition angle acquisition model for processing to obtain multiple historical acquisition angle information; averaging a plurality of historical acquisition angle information to obtain the target object of the corresponding target object Gather angle information.
  • the second characteristic information acquiring unit 33 when used to acquire the attribute information of the acquisition device corresponding to each target object, it is specifically configured to: acquire a plurality of historical images, and the acquisition conditions of the plurality of historical images are the same as those of the current image to be archived ; Input multiple historical images into the pre-trained sharpness model for processing to obtain multiple historical sharpness information; average multiple historical sharpness information to obtain the sharpness information.
  • the auxiliary feature information further includes spatiotemporal sequence feature information, and the spatiotemporal sequence feature information characterizes the activity rule of the corresponding target object in the target scene.
  • the second feature information acquiring unit 33 is used to acquire auxiliary feature information corresponding to each of the target objects from a plurality of images to be archived; it is also specifically used to:
  • the spatiotemporal feature information of the images to be archived corresponding to the target objects in each target object set is obtained, and then each target object set is calculated
  • the corresponding spatio-temporal sequence feature information, wherein the spatio-temporal feature information is the acquisition time of the image to be archived and the serial number of the acquisition device;
  • the time-space sequence feature information corresponding to each target object set is used as the time-space sequence feature information of each target object in the target object set.
  • the second feature information acquiring unit 33 when used to acquire auxiliary feature information corresponding to each target object from multiple images to be archived; it is also specifically used to:
  • the visual feature information corresponding to the first target object and the visual feature information corresponding to each target object belonging to Y target object sets Perform similarity comparison to obtain the top K target objects with the highest similarity;
  • the set of candidate target objects is Y image sets including the first K The set of target objects with the largest number of target objects;
  • the spatiotemporal sequence feature information of each updated target object set is used as the spatiotemporal sequence feature information of all target objects in the updated image set.
  • the electronic archive forming unit 34 when used to cluster all the target objects according to the visual feature information and the auxiliary feature information corresponding to each of the target objects to obtain multiple electronic archives, Specifically, it is also used to: input the visual feature information of the target object and the auxiliary feature information into the pre-trained archiving model for processing, and obtain the similarity result between the target objects; according to the similarity result, all The target objects are clustered to obtain multiple electronic files.
  • FIG. 4 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the terminal device 4 of this embodiment includes: at least one processor 40 (only one is shown in FIG. 4 ), a processor, a memory 41 and stored in the memory 41 and can be processed in the at least one processor.
  • the computer program 42 running on the processor 40 when the processor 40 executes the computer program 42, implements the steps in any of the above-mentioned embodiments of the electronic file generation method.
  • the terminal device 4 may be computing devices such as desktop computers, notebooks, palmtop computers, and cloud servers.
  • the terminal device may include, but not limited to, a processor 40 and a memory 41 .
  • FIG. 4 is only an example of the terminal device 4, and does not constitute a limitation on the terminal device 4. It may include more or less components than those shown in the figure, or combine certain components, or different components. , for example, may also include input and output devices, network access devices, and so on.
  • the so-called processor 40 can be a central processing unit (Central Processing Unit, CPU), and the processor 40 can also be other general processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit) , ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the storage 41 may be an internal storage unit of the terminal device 4 in some embodiments, such as a hard disk or memory of the terminal device 4 .
  • the memory 41 may also be an external storage device of the terminal device 4 in other embodiments, such as a plug-in hard disk equipped on the terminal device 4, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Further, the memory 41 may also include both an internal storage unit of the terminal device 4 and an external storage device.
  • the memory 41 is used to store operating systems, application programs, boot loaders (BootLoader), data, and other programs, such as program codes of the computer programs.
  • the memory 41 can also be used to temporarily store data that has been output or will be output.
  • the embodiment of the present application also provides a network device, which includes: at least one processor, a memory, and a computer program stored in the memory and operable on the at least one processor, and the processor executes The computer program implements the steps in any of the above method embodiments.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in each of the foregoing method embodiments can be realized.
  • An embodiment of the present application provides a computer program product.
  • the computer program product When the computer program product is run on a mobile terminal, the mobile terminal can implement the steps in the foregoing method embodiments when executed.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the procedures in the method of the above-mentioned embodiments in the present application can be completed by instructing related hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium.
  • the computer program When executed by a processor, the steps in the above-mentioned various method embodiments can be realized.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form.
  • the computer-readable medium may at least include: any entity or device capable of carrying computer program codes to the electronic archive generation device/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random storage Access memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal and software distribution medium.
  • ROM read-only memory
  • RAM random storage Access memory
  • electrical carrier signal telecommunication signal and software distribution medium.
  • U disk mobile hard disk, magnetic disk or CD, etc.
  • computer readable media may not be electrical carrier signals and telecommunication signals under legislation and patent practice.
  • the disclosed device/network device and method may be implemented in other ways.
  • the device/network device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种电子档案生成方法、装置、终端设备及存储介质,方法包括:获取多个待归档图像;对每个待归档图像进行目标对象的识别,获得多个待归档图像对应的所有目标对象的视觉特征信息;从多个待归档图像中获取每个目标对象对应的辅助特征信息,辅助特征信息表征与对应的所述目标对象相关联的时间或空间信息;根据每个目标对象对应的所述视觉特征信息和辅助特征信息,对所有目标对象进行聚类,获得多个电子档案。利用目标对象的视觉特征信息和辅助特征信息对待归档图像中的所有目标对象进行聚类,辅助特征信息将目标对象的时间信息或空间信息引入到归档过程,提高聚类精度,满足实际工程应用中的业务要求。

Description

电子档案生成方法、装置、终端设备及存储介质
本申请要求于2021年11月9日提交中国专利局,申请号为202111320826.2、发明名称为“电子档案生成方法、装置、终端设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请属于数据处理技术领域,尤其涉及电子档案生成方法、装置、终端设备及存储介质。
背景技术
随着互联网和物联网在日常生活中发挥越来越深刻的作用,数据电子化是非常重要的环节。电子档案作为电子化和档案化的数据,对挖掘潜在事实和规律是不可缺少的。电子化和档案化的数据可以用于大数据分析和挖掘,可以用于机器学习模型训练,甚至通过挖掘有价值情报用于实际业务落地。尤其在最近十年,在机器视觉领域机器学习模型达到了可以替代人工的水平。随着深度学习在机器视觉领域取得重要进展,视觉数据电子档案逐渐融入到人们生活的方方面面。
视觉数据一般包括视频数据和图片数据。视觉数据电子档案自动生成是将用户定义的目标对象进行数据治理和分析,按照一定规则聚类数据的过程。比如,给定一定数量的终端摄像头抓拍图片,用户定义的目标对象是人,那么视觉数据电子档案自动生成是将同一个人的抓拍图片聚在一起,不同的人的抓拍图片分开;最终形成每个人的抓拍图片档案,在工程业务上称为一人一档。根据用户定义的目标对象,视觉数据电子档案自动生成的档案的目标对象可以是人、物甚至是某个抽象概念的集合体。在工程业务上有一人一档、一车一档、一关系一档等。在一关系一档中,用户定义的目标对象是抽象的概念。具体关系可以是朋友关系和家人关系等。视觉数据的一关系一档可以形成对人的社交关系的建模、分析和挖掘。
当前的视觉数据电子档案自动生成一般是基于单一维度的特征信息对视觉数据进行聚类的,尤其是单一维度的视觉特征信息。比如,在一人一档中,传统的模型和算法都是基于图片或者视频的目标对象视觉特征(类似于目标体关键点特征)。传统的视觉数据电子档案自动生成方法聚类精度不高,在实际工程应用中往往达不到业务要求。
发明内容
本申请实施例提供了一种电子档案生成方法、装置、终端设备及存储介质,可以解决传统的视觉数据电子档案自动生成方法聚类精度不高,在实际工程应用中往往达不到业务要求的技术问题。
第一方面,本申请实施例提供了一种电子档案生成方法,包括:
获取多个待归档图像;
对每个所述待归档图像进行目标对象的识别,获得多个所述待归档图像对应的所有目标对象的视觉特征信息;
从多个所述待归档图像中获取每个所述目标对象对应的辅助特征信息,所述辅助特征信息表征与对应的所述目标对象相关联的时间或空间信息;
根据每个所述目标对象对应的所述视觉特征信息和所述辅助特征信息,对所有所述目标对象进行聚类,获得多个电子档案。
第二方面,本申请实施例提供了一种电子档案生成装置,包括:
图像获取单元,用于获取多个待归档图像;
第一特征信息获取单元,用于对每个所述待归档图像进行目标对象的识别,获得多个所述待归档图像对应的所有目标对象的视觉特征信息;
第二特征信息获取单元,用于从多个所述待归档图像中获取每个所述目标对象对应的辅助特征信息,所述辅助特征信息表征与对应的所述目标对象相关联的时间或空间信息;
电子档案形成单元,用于根据每个所述目标对象对应的所述视觉特征信息和所述辅助特征信息,对所有所述目标对象进行聚类,获得多个电子档案。
第三方面,本申请实施例提供了一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述第一方面所述方法的步骤。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面所述方法的步骤。
第五方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行上述第一方面所述方法的步骤。
由上可见,本申请实施例首先对多个待归档图像进行目标对象的识别,根据多个待归档图像对应的所有目标对象的视觉特征信息和辅助特征信息,对所有目标对象进行聚类得到多个电子档案,避免了依赖单一的目标对象的视觉特征信息进行聚类导致的聚类精度不高的问题。同时其中的辅助特征信息表征与目标对象相关联的时间或空间信息,因此通过辅助特征将目标对象的时间信息或空间信息引入到归档过程,可以对目标对象的视觉特征信息形成有效的补充。尤其对于目标对象的视觉特征信息不足的视觉数据或者数据质量不高的视觉数据,可以有效提高其归档的精度,从而满足实际工程应用中的业务要求。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例提供的应用场景示意图;
图2是本申请一实施例提供的电子档案生成方法的流程示意图;
图3是本申请实施例提供的电子档案生成装置的结构示意图;
图4是本申请实施例提供的终端设备的结构示意图。
具体实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不 必要的细节妨碍本申请的描述。
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本申请说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
本申请实施例中的电子档案具体来说指的是将具有相同目标对象属性的视觉数据聚类成一个档案,不同目标对象属性的视觉数据分在不同档案。档案的目标对象属性是用户自定义的。比如,当目标对象属性定义为人时,那么本申请实施例中生成的电子档案是以人为单位的视觉数据集合。当然目标对象属性可以根据需要进行设置,例如目标对象属性可以是人或汽车等。
本申请提供的电子档案生成方法,可以应用于如图1所示的场景中,在该场景中包括多个采集设备11和电子档案生成设备12,多个采集设备设置在不同位置上,多个采集设备11将拍摄到的多张的图像发给电子档案生成设 备12,电子档案生成设备12生成电子档案,针对所有图像涉及的每个目标对象个体,建立一个档案,并将对应的图像归入对应的电子档案中。
其中多个采集设备11可以设置在某个固定区域的多个不同位置上,例如可能是设置在某个商场的不同位置,例如商场电梯口、商场入口、商场扶梯口、商场出口等,电子档案生成设备12对出现在不同采集设备中的人进行归档,生成针对该商场的一人一档。
本申请实施例中的电子档案生成设备可以是服务器、手机、平板电脑、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等终端设备。本申请实施例对终端设备的具体类型不作任何限制。
在另外一个实施例中,采集设备11具备一定的数据处理功能,在进行图像拍摄的同时还可以对图像进行预处理之后再发送给电子档案生成设备12,其中预处理可以是对拍摄得到的图像进行结构化处理,并提取其中的目标对象的视觉特征信息;将结构化处理的图像和对应的目标对象的视觉特征信息传输给电子档案生成设备12,从而进一步完成后续的归档操作。采集设备的数据处理功能可以减轻电子档案生成设备12的数据处理压力。
本申请实施例中的采集设备可以是具有数据处理功能的摄像头,另外采集设备还可以是具有摄像功能的手机、平板电脑、笔记本电脑等终端设备,本申请实施例对采集设备的具体类型不作任何限制。当然在一些实施例中,采集设备也可以是普通的摄像头。
下面结合具体实施例对本申请提供的电子档案生成方法进行示例性的说明。
图2示出了本申请提供的电子档案生成方法的示意性流程图,参见图2,该方法包括:
步骤S201,获取多个待归档图像。
本申请实施例中的待归档图像来自于视觉数据,视觉数据可以为图片数据和/或视频数据,其中待归档图像可以是为从采集设备直接获得的图像,也可以是由采集设备获得的视频数据得到的图像帧。多个待归档图像可以是直接从采集设备获得的,也可以是预先从采集设备获取并存储的。
在一个实施例中,在进行特征提取之前需要对待归档图像进行结构化处 理,图片的结构化处理指的是对图片的每个像素点的像素值进行提取,获得每个像素点的像素值。
在一个实施例中,还可以对结构化的图像进行池化操作,使得所有图像尺寸统一,便于后续的特征信息提取和特征融合步骤的进行。
在本申请实施例中,每个待归档图像中可以包含一个或多个目标对象。
步骤S202,对每个所述待归档图像进行目标对象的识别,获得多个所述待归档图像对应的所有目标对象的视觉特征信息。
实施例中,视觉特征信息是目标对象本身具有的视觉特征,具体来说指的从视觉角度可以对不同目标对象个体进行区分的特征。不同目标对象个体具有不同的视觉特征信息。例如,如果归档针对的目标对象为人,即要将待归档图像按照一人一档形成的电子档案,此时视觉特征信息可以为人脸特征或者人体特征。
实施例中,多个待归档图像对应的所有目标对象指的是多个待归档图像中包括所有目标对象,例如假设需要对有四个待归档图像进行电子档案的生成,第一待归档图像中有1个目标对象,第二待归档图像中有3个目标对象,第三待归档图像中有3个目标对象,第四待归档图像中有2个目标对象;那么该四个待归档图像对应的所有目标对象个数的加和,即该四个待归档图像对应的所有目标对象的个数为9。
示例性的,目标对象的识别即为目标对象的视觉特征信息的提取过程,视觉特征信息可以采用预训练的深度学习网络模型对待归档图像进行特征提取得到,本申请实施例中的深度学习网络模型可以为卷积神经网络CNN、残差网络ResNet、图卷积神经网络GCN、Transformer等,在此不做具体限定。利用深度学习网络模型进行视觉特征信息的提取属于本领域的常规技术手段,在此不做赘述。
步骤S203,从多个所述待归档图像中获取每个所述目标对象对应的辅助特征信息,所述辅助特征信息表征与对应的所述目标对象相关联的时间或空间信息。
实施例中的辅助特征信息是与对应的目标对象相关联的与时间相关的信息或者与空间相关的信息,辅助特征信息可以从时间或空间的角度对目标对象的视觉特征信息进行补充,从而充分利用待归档图像中的相关信息,为后 续实现待归档图像的精确聚类打下基础。
在一个实施例中,辅助特征信息可以包括背景特征信息。背景特征信息具体指的是目标对象所对应的待归档图像中剔除目标对象的视觉特征信息之后的背景信息。可以理解的是,背景特征信息的获取方法可以为:将每个目标对象对应的待归档图像和该目标对象的视觉特征信息进行差值化处理,得到每个目标对象的背景特征信息。例如在待归档图像中的目标对象视觉特征信息不全或者不清楚的情况下,背景特征信息可以对目标对象视觉特征信息形成很好的补充,例如可以利用具有高相似度的背景特征信息的其他待归档图像来对该待归档图像中的目标对象进行聚类。
示例性的,假设某个待归档图像包括3目标对象,编号分别为a、b和c,其中目标对象a的背景特征信息指的是该待归档图像剔除目标对象a之后的图像信息,即在该待归档图像中目标对象b和目标对象c均为目标对象a的背景特征信息。以此类推可以得到目标对象b和目标对象c的背景特征信息
在一个实施例中,辅助特征信息还可以包括采集设备属性信息。采集设备属性信息指的是基于采集设备的某些属性使得目标对象所具备的信息,采集设备属性信息尤其指的是会对目标对象的视觉特征信息产生影响的与采集设备相关的信息。采集设备属性信息可以为高准确率的视觉数据电子档案的形成提供额外的信息支撑。
示例性的,实施例中的采集设备属性信息可以包括目标对象密度信息、目标对象采集角度信息和/或清晰度信息。目标对象密度信息、目标对象采集角度信息和清晰度信息对提取的目标对象的视觉特征信息的质量均有较大的影响,可以根据目标对象密度信息、目标对象采集角度信息和清晰度信息调整归档过程中对视觉特征信息的具体要求,例如在按照视觉特征信息来比较属于不同待归档图像的两个目标对象的相似度时,对于具有不同采集设备属性信息(目标对象密度信息、目标对象采集角度信息和/或清晰度信息)的目标对象,可以设置不同的相似度阈值。
本申请实施例中,目标对象密度信息指的是与该目标对象对应的待归档图像中的目标对象的个数。一般来说目标对象密度信息与采集设备的安装地点相关,因此本申请实施例中将目标对象密度信息归为采集设备属性信息,并且由此可知目标对象密度信息与待归档图像的空间信息相关联。
以目标对象为人的应用场景进行具体的说明,在目标对象为人时,目标对象密度指的是与某个目标对象对应的待归档图像中的人流密度即人头个数;人流密度对归档结果具有较大的影响,人流密度会对目标对象的视觉特征信息的提取质量产生较大影响:一般来说在人流密度较大的待归档图像提取到的目标对象的视觉特征信息质量较低,因此在后续对其进行聚类时需要降低对该待归档图像的要求,才能达到准确聚类。例如,假设通过比对目标对象的视觉特征信息的相似度来进行聚类时,对于人流密度较小的待归档图像来说相似度80%以上时判断是同一个人,对于人流密度较大的待归档图像来说,相似度在60%以上便可判断是同一个人。
本申请实施例中,目标对象密度信息可以是目标对象对应的当前的待归档图像中的目标对象密度信息,也可以是预设历史时间段内的平均目标对象密度信息。平均目标对象密度信息指的是在预设历史时间段内,获取待归档图像的采集设备获取的历史图像对应的目标对象密度信息的平均值。
当目标对象密度信息是目标对象对应的当前的待归档图像中的目标对象密度信息时,获取目标对象密度信息的方法包括:对每个目标对象对应的待归档图像进行目标对象的计数,得到目标对象个数,该目标对象个数作为每个所述目标对象对应的目标对象密度信息。
当目标对象密度信息是预设历史时间段内的平均目标对象密度信息时,获取目标对象密度信息的方法包括:获取多个历史图像,多个历史图像与当前待归档图像对应采集条件相同(采集条件相同指的是在同一个安装位置具有相同采集参数,例如属于一个采集设备);对每个历史图像进行目标对象的计数,并对计数结果求平均值,获得平均目标对象密度信息,所述平均目标对象密度信息作为对应目标对象的目标对象密度信息。
实施例中,目标对象采集角度信息指的是目标对象的偏角,可选的偏角可以包括翻滚角、俯仰角、偏航角三个角度信息。目标对像采集角度与设备的安装位置和方向有关。
以目标对象是人的场景为例,目标对象采集角度信息指的是采集到的人脸的角度,一般来说可以默认采集设备是固定的,那么目标对象采集角度与人的活动规律和采集设备的安装情况相关。因此本申请实施例中将目标对象采集角度信息归为设备属性信息,并且由此可知目标对象采集角度信息与目 标图像对应的待归档图像的空间信息相关联。
本申请实施例中,目标对象采集角度信息可以是目标对象对应的当前待归档图像中的目标对象采集角度信息,也可以是预设历史时间段内的历史平均值。
当目标对象采集角度信息是目标对象对应的当前待归档图像中的目标对象采集角度信息时,获取目标对象采集角度的方法可以包括:将当前待归档图像输入到预训练的采集角度获取模型中处理,得到所述目标对象采集角度信息。
当目标对象采集角度信息是历史平均值时,获取目标对象采集角度的方法可以包括:获取多个历史图像,多个历史图像与当前待归档图像对应采集条件相同(采集条件相同指的是在同一个安装位置具有相同采集参数,例如属于一个采集设备);将多个历史图像输入到预训练的采集角度获取模型中处理,得到多个历史采集角度信息;对多个所述历史采集角度信息取平均值,得到所述目标对象采集角度信息。
本申请实施例中,清晰度信息可以是目标对象对应的当前待归档图像的清晰度信息,也可以是预设历史时间段内的历史平均值。
当清晰度信息是目标对象对应的当前待归档图像的清晰度信息时,获取清晰度信息的方法可以包括:将当前待归档图像输入到预训练的清晰度模型中处理,得到所述清晰度信息。
当清晰度信息是历史平均值时,获取清晰度信息的方法可以包括:获取多个历史图像,多个历史图像与当前待归档图像对应采集条件相同(采集条件相同指的是在同一个安装位置具有相同采集参数,例如属于一个采集设备);将多个历史图像输入到预训练的清晰度模型中处理,得到多个历史清晰度信息;将多个历史清晰度信息求平均值,得到所述清晰度信息。
本申请实施例中的采集角度获取模型和清晰度模型均可以为深度学习网络模型,具体来说深度学习网络模型可以为卷积神经网络CNN、残差网络ResNet、图卷积神经网络GCN、Transformer等。不同用途的深度学习网络模型,是采用不同的数据集利用常规的训练方法进行训练得到。利用深度学习网络模型进行目标对象采集角度信息和清晰度信息的提取属于本领域的常规技术手段,在此不做赘述。
本申请实施例中,由于单个图像对应的目标对象采集角度信息和清晰度信息因为具有较大的偶然性,在将其用于聚类时这种偶然性会产生较大的噪声。采用历史平均值可以降低偶然性导致的噪声,使得得到的目标对象采集角度信息和清晰度信息更具有代表性,降低噪声对聚类精度的影响。
目标对象采集角度信息和清晰度信息对目标对象视觉特征信息的提取具有很大的影响,下面以目标对象为人的应用场景进行具体的说明,目标对象采集角度可以为人的正脸也可能是人的侧脸,在待归档图像采集的是人的正脸的情况下,比采集到人的侧脸来说提取到的视觉特征信息(人脸特征)具有更好的质量,因此后续聚类效果越好。就清晰度信息来说,人脸的清晰度越高,后续具有更好的聚类效果。
在一个实施例中,辅助特征信息还可以包括时空序列特征信息,时空序列特征信息表征对应的目标对象在目标场景下的活动规律。时空序列特征信息是与目标对象相关联的,时空序列特征信息是基于所有待归档图像中目标对象的活动规律的总结得到的。时空序列特征信息的具体作用是,补全因为目标对象视觉特征信息不足而遗漏的视觉数据,从而完成这些视觉数据的聚类。比如,目标对象为人的情况下,由于待归档图像因为其中的人脸或者人体的特征信息的角度问题不能正确进行聚类,那么,通过与目标对象关联的时空序列特征信息可以解决这类因角度遗漏的视觉数据,例如从时空序列特征信息可以得知目标对象A经常在上午8点出现在采集设备1中,那么如果遇到采集设备1在上午8点拍摄的某个待归档图像中目标对象视觉特征不清楚的情况,可以判断该目标对象是A的概率比较大,从而实现补全视觉数据的作用。
示例性的,在辅助特征信息包括时空序列特征信息时,从多个所述待归档图像中获取每个所述目标对象对应的辅助特征信息,可以包括:
根据每个所述目标对象对应的视觉特征信息,对所有目标对象进行预聚类,得到Y个目标对象集合;
若所有目标对象均可以聚类到Y个目标对象集合中,则获取每个所述目标对象集合中的目标对象对应的待归档图像的时空特征信息,进而计算得出每个所述目标对象集合所对应的时空序列特征信息,其中时空特征信息为待归档图像的采集时间和采集设备编号;
将每个所述目标对象集合对应的时空序列特征信息作为该目标对象集合中每个目标对象的时空序列特征信息。
本申请实施例中,通过预聚类得到Y个目标对象集合,其中每个目标对象集合包括的所有目标对象按照预聚类的标准来说是同一个目标对象个体。若一个待归档图像中包括多个目标对象,不同目标对象属于不同的目标对象集合。例如在目标对象为人的应用场景下,如果通过预聚类得到Y个目标对象集合,可以认为是经过预聚类后初步判断所有待归档图像中包括Y个不同的人。
本申请实施例中,所述计算得出每个所述目标对象集合所对应的时空序列特征信息,具体来说包括:
将每个目标对象集合中的目标对象对应的待归档图像的时空特征信息作为对应目标对象的时空特征信息;
按照采集时间对每个目标对象集合中的所有目标对象进行排序,获得该目标对象集合对应的排序后目标对象集合;
获取每个所述排序后目标对象集合对应的采集时间数列和采集设备编号数列,将所述采集时间数列和所述采集设备标号数列作为对应的目标对象集合的时空序列特征信息。
本申请实施例中,在同一个目标对象集合中,所有目标对象对应的时空序列特征信息是相同的。根据时空序列特征信息表征该目标对象集合对应的目标对象个体按照时间在出现在不同采集设备下。
在一个实施例中,所述从多个所述待归档图像中获取每个所述目标对象对应的辅助特征信息,还包括:
若所有目标对象中存在不属于Y个目标对象集合的第一目标对象,则将所述第一目标对象对应的视觉特征信息与属于Y个目标对象集合的每个目标对象所对应的视觉特征信息进行相似度比较,获取相似度最高的前K个目标对象;
将所述第一目标对象加入候选目标对象集合,进而对Y个目标对象集合进行更新,得到Y个更新后目标对象集合,其中所述候选目标对象集合为Y个图像集合中包括所述前K个目标对象最多的目标对象集合;
获取每个更新后目标对象集合中的目标对象对应的待归档图像的时空特 征信息,进而计算得出每个更新后目标对象集合所对应的时空序列特征信息,其中所述时空特征信息为待归档图像的采集时间以及采集设备编号;
将每个所述更新后目标对象集合的时空序列特征信息作为该更新后目标对象集合中所有目标对象的时空序列特征信息。
本申请实施例中,计算得出每个更新后目标对象集合所对应的时空序列特征信息的方法,与计算得出每个目标对象集合所对应的时空序列特征信息的方法基本相同,在此不做赘述。
为了便于理解,下面以目标对象为人的应用场景,时空序列特征信息的获取方法进行说明。
假设有1000张待归档图像,该1000张归档图像来自于10个采集设备,目标对象为图像中的人,在1000张待归档图像形成一人一档的电子档案的过程中,在步骤S202中对每个待归档图像进行目标对象识别(例如可以为人脸识别),获取了1000张待归档图像中的所有目标对象的视觉特征信息(例如可以为人脸特征);在步骤S203中获取了每个目标对象对应的辅助特征信息,辅助特征信中的背景特征信息和设备属性信息在前文中已经详细叙述,在此不再赘述。下面对步骤S203中获取辅助特征信息中的时空序列特征信息的方法进行举例说明:
第一种情况是:假设1000张待归档图像总共得到1500个目标对象,根据1000张待归档图像对应的1500个目标对象的视觉特征信息,对所有目标对象进行预聚类,假设得到20个目标对象集合,如果所有目标对象均可以聚类到20个目标对象集合中,20个图像集合对应20个目标对象个体。此种情况下1000张待归档图像大致是关于20个不同的目标对象个体。假设目标对象A对应目标对象集合B1,目标对象集合B1中包含60个目标对象,每个目标对象对应一张待归档图像;对目标对象集合B1中的60个目标对象按照对应待归档图像的时间进行排序,然后将排序后的60个目标对象对应的待归档图像的采集时间和采集设备编号按照排序提取出来,得到目标对象集合B1对应的采集时间数列和采集设备编号数列;目标对象集合B1中的每个目标对象的时空序列特征信息均为所得到的采集时间数列和采集设备编号数列。对每个目标对象集合进行一次上述操作,得到所有目标对象对应的时空序列特征信息。
示例性的,其中采集时间可以按照一天24小时划分,即采集时间可能是24个数值中的一个,本实施例中采集设备编号共有10个。
第二种情况是:假设1000张待归档图像总共得到1500个目标对象,根据1000张待归档图像对应的1500个目标对象的视觉特征信息,对所有目标对象进行预聚类,假设得到20个目标对象集合,20个目标对象集合对应20个目标对象个体。如果出现无法聚类到20个目标对象集合中的一个目标对象X。此种情况下,需要对目标对象X进行单独处理,具体处理方法包括:将目标对象X对应的视觉特征信息与1499个目标对象对应的视觉特征信息进行相似度比较,选出相似度最高的前100个目标对象,如果20个目标对象集合中包括所述100目标对象最多的是目标对象集合B2,那么目标对象集合B2为候选目标对象集合,将目标对象X加入到候选目标对象集合B2中;从而实现对20个目标对象集合进行了更新,得到20个更新后目标对象集合。对20个更新后目标对象集合,采用与第一种情况中的针对目标对象集合B1的相同操作方法,获得所有目标对象对应的时空序列特征信息。
本申请实施例中,获取时空序列特征信息的具体方法,可以对无法预聚类的目标对象X进行处理,如果采用常规的形成电子档案的聚类方法,在第二种情况下,目标对象X可能会被随意聚类到一个目标对象集合中或者放弃目标对象X。因此基于本申请实施例中的方法可以对目标对象X进行科学的预聚类,避免随意归类或放弃目标对象X,因此相对于常规的聚类方法,本申请中的方法在后续可以得到更加准确的电子档案。
步骤S204,根据每个所述目标对象对应的所述视觉特征信息和所述辅助特征信息,对所有所述目标对象进行聚类,获得多个电子档案。
实施例中,每个电子档案分别对应一个特定的目标对象个体,即按照目标对象进行电子档案的归档,每个电子档案包括至少一个待归档图像,每个电子档案中的所有待归档图像中均包含同一个特定的目标对象个体,可以应用于工程业务上的一人一档或一车一档等。
本申请实施例,根据目标对象的所述视觉特征信息和辅助特征信息对所有目标对象进行聚类,通过引入辅助特征信息将与目标对象的时间信息或空间信息引入到归档过程,从而可以对目标对象的视觉特征信息形成有效的补充,尤其是对于目标对象的视觉特征信息不足的视觉数据或者数据质量不高 的视觉数据,可以有效提高其归档的精度,满足实际工程应用的业务要求。
示例性的,步骤S204的具体实现方式可以为:将所有所述目标对象对应的所述视觉特征信息和所述辅助特征信息输入至预训练的归档模型中进行处理,获得所述目标对象之间的相似度结果;根据所述相似度结果对所有目标对象进行聚类,获得对应于目标对象个体的多个电子档案。
为了增强归档模型聚类效果,优选的归档模型的训练样本的来源与待归档图像的来源相同或相似。其中,来源相同指的是训练样本对应的采集设备和待归档图像对应的采集设备是同一批;来源相似指的是训练样本对应的采集设备和待归档图像对应的采集设备设置场景相似,拓扑结构相似。
在一个实施例中,归档模型的表达式如式(1)所示:
D=f 1(A 1)+f 2(A 2)+f 3(A 3)+f 4(A 4)      (1)
其中D为档案矩阵,维度为M×M,M为待归档图像对应的所有目标对象个数;A 1代表所有目标对象的视觉特征信息集合、A 2代表所有目标对象的背景特征信息集合、A 3代表所有目标对象的时空序列特征信息集合、A 4代表所有目标对象的设备属性信息集合。函数f 1,f 2,f 3,f 4分别将对应的集合进行矩阵变换,得到与集合对应的表征所有目标对象中的两两目标对象之间相似度的分量矩阵,将四个分量矩阵进行加和计算得到最终的档案矩阵。
如果变换函数为线性变化,归档模型的表达式如式(2)所示:
D=W 1A 1+W 2A 2+W 3A 3+W 4A 4     (2)
在式(2)中,D的维度为M×M,M为待归档图像对应的所有目标对象的个数;W 1的维度为M×P 1,A 1的维度为P 1×M;W 2的维度为M×P 2,A 2的维度为P 2×M;W 3的维度为M×P 3,A 3的维度为P 3×M;W 4的维度为M×P 4,A 4的维度为P 4×M。其中,P 1、P 2、P 3和P 4分别为A 1、A 2、A 3和A 4对应的数据的维度。
示例性的,通过模型得到的档案矩阵为方阵,档案矩阵的维度M等于待归档图像对应的所有目标对象的个数,一般来说所有目标对象的个数大于等于待归档图像的张数。
为了便于理解,下面以参数W 1和所有目标对象的视觉特征信息集合A 1为例对数据结构和档案矩阵进行说明,其中
Figure PCTCN2022099852-appb-000001
α 1为权重系数,
Figure PCTCN2022099852-appb-000002
为矩阵A 1的转置矩阵;A 1的每一列对应一个目标对象的视觉特征信息,
Figure PCTCN2022099852-appb-000003
的每一行对应一个目标对象的视觉特征信息。例如,A 1的第一列代表第一目标对 象对应的视觉特征信息,
Figure PCTCN2022099852-appb-000004
的第二行代表第二目标对象对应的视觉特征信息,A 1的第一列乘以
Figure PCTCN2022099852-appb-000005
的第二行之后得到的是第一目标对象和第二目标对象之间与视觉特征信息相关的相似度结果,以此类推获得所有目标对象中两两目标对象之间的视觉特征信息相关的相似度结果,得到矩阵W 1A 1。矩阵W 2A 2、W 3A 3和W 4A 4的计算方法类似。最后通过矩阵加和得到档案矩阵D。
本申请实施例中,W 1-W 4为线性变换的归档模型参数。归档模型的参数可以通过对深度学习网络进行训练得到。深度学习网络可以为卷积神经网络CNN、残差网络ResNet、图卷积神经网络GCN、Transformer等。
实施例中,若训练样本数据为有标签的数据,则直接采用有监督的方式进行深度学习网络的训练,例如训练样本为若干组样本图像,每组样本图像为一个训练样本。每组样本图像关于目标对象的电子档案是已知的,即每组样本图像中所有样本图像中两两样本图像的相似度结果已知;首先对每组样本图像中每个样本图像的视觉特征信息和辅助特征信息(具体方法参见本申请实施例中电子档案生成方法中的目标对象的视觉特征信息和辅助特征信息的提取方法);将若干个组切分为训练集和测试集,训练集和测试集分别包括多组样本图像;深度学习网络的训练方法包括:
采用训练集中的样本对初始深度学习网络模型进行迭代寻训练,使得损失函数最小,得到对应的模型参数;
在训练集和测试集上分别对模型的准确率进行评估,若评估结果达到预设条件,则完成深度学习网络的训练。
实施例中,若训练样本为无标签的数据,可以首先通过特征搜索相似度的方式获得高质量的部分电子档案,然后将高质量的部分电子档案作为训练样本对深度学习网络进行有监督的训练,具体方法与前文相同,在此不做赘述。
示例性的,在档案矩阵D中,计算得到的每个数值并非一定为0或1,而是为接近1或0的数值,在最终处理时可以对档案矩阵中的数值进行二分类处理,例如可以取0.5为阈值,当档案矩阵D中的数值大于0.5则取1,小于等于0.5则取0。可以理解的是,在档案矩阵中的值为0或1,1代表以行为ID的图片和以列为ID的图片是属于一个电子档案。
示例性的,如式(3)所示,档案矩阵D 0表示参与电子档案自动生成的待 归档图像对应的所有目标对象的个数为四个,生成结果为第一个目标对象和第二个目标对象属于同一个目标对象个体,第三个目标对象和第四个目标对象属于同一个目标对象个体。第一个目标对象对应的待归档图像和第二个目标对象对应的待归档图像属于一个电子档案,第三目标对象对应的待归档图像和第四目标对象对应的待归档图像属于另一个电子档案。
Figure PCTCN2022099852-appb-000006
本申请实施例根据多个待归档图像对应所有目标对象的视觉特征信息和辅助特征信息,对所有目标对象进行聚类得到电子档案,避免了依赖单一的目标对象的视觉特征信息进行聚类导致的聚类精度不高的问题。同时其中的辅助特征信息表征与对应的目标对象相关联的时间或空间信息,因此通过辅助特征将目标对象的时间信息或空间信息引入到归档过程,可以对目标对象的视觉特征信息形成有效的补充。尤其对于目标对象的视觉特征不足的视觉数据或者数据质量不高的视觉数据,可以有效提高其归档的精度,从而满足实际工程应用中的业务要求。应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
对应于上文实施例所述的电子档案生成方法,图3示出了本申请实施例提供的电子档案生成装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。
参照图3,电子档案生成装置3包括:
图像获取单元31,用于获取多个待归档图像;
第一特征信息获取单元32,用于对每个所述待归档图像进行目标对象的识别,获得多个所述待归档图像对应的所有目标对象的视觉特征信息;
第二特征信息获取单元33,用于从多个所述待归档图像中获取每个所述目标对象对应的辅助特征信息,所述辅助特征信息表征与对应的所述目标对象相关联的时间或空间信息;
电子档案形成单元34,用于根据每个所述目标对象对应的所述视觉特征信息和所述辅助特征信息,对所有目标对象进行聚类,获得多个电子档案。
在一个实施例中,其中所述辅助特征信息包括背景特征信息,第二特征 信息获取单元33在用于从多个所述待归档图像中获取每个所述目标对象对应的辅助特征信息时,具体用于:将每个所述目标对象所对应的待归档图像和该目标对象的视觉特征信息进行差值化处理,得到每个所述目标对象对应的背景特征信息。
可选的,所述辅助特征信息还包括采集设备属性信息,第二特征信息获取单元33在用于从多个所述待归档图像中获取每个所述目标对象对应的辅助特征信息时,还具体用于获取每个目标对象对应的采集设备属性信息。
可选的,采集设备属性信息可以包括目标对象密度信息、目标对象采集角度信息和/或清晰度信息。
可选的,第二特征信息获取单元33在用于获取每个目标对象对应的采集设备属性信息时,具体用于:对每个目标对象对应的待归档图像进行目标对象的计数,得到目标对象个数,该目标对象个数作为每个所述目标对象对应的目标对象密度信息。
可选的,第二特征信息获取单元33在用于获取每个目标对象对应的采集设备属性信息时,具体还用于:获取多个历史图像,多个历史图像与当前待归档图像对应采集条件相同;将多个历史图像输入到预训练的采集角度获取模型中处理,得到多个历史采集角度信息;对多个所述历史采集角度信息取平均值,得到对应的目标对象的所述目标对象采集角度信息。
可选的,第二特征信息获取单元33在用于获取每个目标对象对应的采集设备属性信息时,具体用于:获取多个历史图像,多个历史图像与当前待归档图像对应采集条件相同;将多个历史图像输入到预训练的清晰度模型中处理,得到多个历史清晰度信息;将多个历史清晰度信息求平均值,得到所述清晰度信息。
可选地,其中所述辅助特征信息还包括时空序列特征信息,所述时空序列特征信息表征对应的目标对象在目标场景下的活动规律。第二特征信息获取单元33在用于从多个所述待归档图像中获取每个所述目标对象对应的辅助特征信息时;还具体用于:
根据每个所述目标对象对应的视觉特征信息,对所有目标对象进行预聚类,得到Y个目标对象集合;
若所有目标对象均可以聚类到Y个目标对象集合中,则获取每个所述目 标对象集合中的目标对象对应的待归档图像的时空特征信息,进而计算得出每个所述目标对象集合所对应的时空序列特征信息,其中所述时空特征信息为待归档图像的采集时间以及采集设备编号;
将每个所述目标对象集合对应的时空序列特征信息作为该目标对象集合中每个目标对象的时空序列特征信息。
可选地,第二特征信息获取单元33在用于从多个所述待归档图像中获取每个所述目标对象对应的辅助特征信息时;还具体用于:
若所有目标对象中存在不属于Y个目标对象集合的第一目标对象,则将所述第一目标对象对应的视觉特征信息与属于Y个目标对象集合的每个目标对象所对应的视觉特征信息进行相似度比较,获取相似度最高的前K个目标对象;
将所述第一目标对象加入候选目标对象集合,进而对Y个目标对象集合进行更新,得到Y个更新后目标对象集合,其中所述候选目标对象集合为Y个图像集合中包括所述前K个目标对象最多的目标对象集合;
获取每个更新后目标对象集合中的目标对象对应的待归档图像的时空特征信息,进而计算得出每个更新后目标对象集合所对应的时空序列特征信息,其中所述时空特征信息为待归档图像的采集时间以及采集设备编号;
将每个所述更新后目标对象集合的时空序列特征信息作为该更新后图像集合中所有目标对象的时空序列特征信息。
可选地,电子档案形成单元34在用于根据每个所述目标对象对应的所述视觉特征信息和所述辅助特征信息,对所有所述目标对象进行聚类,获得多个电子档案时,具体还用于:将所述目标对象视觉特征信息和所述辅助特征信息输入至预训练的归档模型中进行处理,获得所述目标对象之间的相似度结果;根据所述相似度结果对所有所述目标对象进行聚类,获得多个电子档案。
需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而 将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
图4为本申请一实施例提供的终端设备的结构示意图。如图3所示,该实施例的终端设备4包括:至少一个处理器40(图4中仅示出一个)处理器、存储器41以及存储在所述存储器41中并可在所述至少一个处理器40上运行的计算机程序42,所述处理器40执行所述计算机程序42时实现上述任意各个电子档案生成方法实施例中的步骤。
所述终端设备4可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。该终端设备可包括,但不仅限于,处理器40、存储器41。本领域技术人员可以理解,图4仅仅是终端设备4的举例,并不构成对终端设备4的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如还可以包括输入输出设备、网络接入设备等。
所称处理器40可以是中央处理单元(Central Processing Unit,CPU),该处理器40还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器41在一些实施例中可以是所述终端设备4的内部存储单元,例如终端设备4的硬盘或内存。所述存储器41在另一些实施例中也可以是所述终端设备4的外部存储设备,例如所述终端设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器41还可以既包括所述终端设备4的内部存储单元也包括外部存储设备。所述存储器41用于存储操 作系统、应用程序、引导装载程序(BootLoader)、数据以及其他程序等,例如所述计算机程序的程序代码等。所述存储器41还可以用于暂时地存储已经输出或者将要输出的数据。
本申请实施例还提供了一种网络设备,该网络设备包括:至少一个处理器、存储器以及存储在所述存储器中并可在所述至少一个处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述任意各个方法实施例中的步骤。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现可实现上述各个方法实施例中的步骤。
本申请实施例提供了一种计算机程序产品,当计算机程序产品在移动终端上运行时,使得移动终端执行时实现可实现上述各个方法实施例中的步骤。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到电子档案生成装置/终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特 定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的实施例中,应该理解到,所揭露的装置/网络设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/网络设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (10)

  1. 一种电子档案生成方法,其特征在于,包括:
    获取多个待归档图像;
    对每个所述待归档图像进行目标对象的识别,获得多个所述待归档图像对应的所有目标对象的视觉特征信息;
    从多个所述待归档图像中获取每个所述目标对象对应的辅助特征信息,所述辅助特征信息表征与对应的所述目标对象相关联的时间或空间信息;
    根据每个所述目标对象对应的所述视觉特征信息和所述辅助特征信息,对所有所述目标对象进行聚类,获得多个电子档案。
  2. 如权利要求1所述的方法,其特征在于,所述辅助特征信息包括背景特征信息;
    所述从多个所述待归档图像中获取每个所述目标对象对应的辅助特征信息,包括:
    将每个所述目标对象所对应的待归档图像和该目标对象的视觉特征信息进行差值化处理,得到每个所述目标对象对应的背景特征信息。
  3. 如权利要求1所述的方法,其特征在于,所述辅助特征信息还包括采集设备属性信息。
  4. 如权利要求1所述的方法,其特征在于,所述辅助特征信息还包括时空序列特征信息,所述时空序列特征信息表征对应的目标对象在目标场景下的活动规律。
  5. 如权利要求4所述的方法,其特征在于,所述从多个所述待归档图像中获取每个所述目标对象对应的辅助特征信息,包括:
    根据每个所述目标对象对应的视觉特征信息,对所有目标对象进行预聚类,得到Y个目标对象集合;
    若所有目标对象均可以聚类到Y个目标对象集合中,则获取每个所述目标对象集合中的目标对象对应的待归档图像的时空特征信息,进而计算得出每个所述目标对象集合所对应的时空序列特征信息,其中所述时空特征信息为待归档图像的采集时间以及采集设备编号;
    将每个所述目标对象集合对应的时空序列特征信息作为该目标对象集合中每个目标对象的时空序列特征信息。
  6. 如权利要求5所述的方法,其特征在于,所述从多个所述待归档图像中获取每个所述目标对象对应的辅助特征信息,还包括:
    若所有目标对象中存在不属于Y个目标对象集合的第一目标对象,则将所述第一目标对象对应的视觉特征信息与属于Y个目标对象集合的每个目标对象所对应的视觉特征信息进行相似度比较,获取相似度最高的前K个目标对象;
    将所述第一目标对象加入候选目标对象集合,进而对Y个目标对象集合进行更新,得到Y个更新后目标对象集合,其中所述候选目标对象集合为Y个图像集合中包括所述前K个目标对象最多的目标对象集合;
    获取每个更新后目标对象集合中的目标对象对应的待归档图像的时空特征信息,进而计算得出每个更新后目标对象集合所对应的时空序列特征信息,其中所述时空特征信息为待归档图像的采集时间以及采集设备编号;
    将每个所述更新后目标对象集合的时空序列特征信息作为该更新后图像集合中所有目标对象的时空序列特征信息。
  7. 如权利要求1至6任一项所述的方法,其特征在于,所述根据每个所述目标对象对应的所述视觉特征信息和所述辅助特征信息,对所有所述目标对象进行聚类,获得多个电子档案,包括:
    将所有所述目标对象对应的所述视觉特征信息和所述辅助特征信息输入至预训练的归档模型中进行处理,获得所述目标对象之间的相似度结果;
    根据所述相似度结果对所有所述目标对象进行聚类,获得多个电子档案。
  8. 一种电子档案生成装置,其特征在于,包括:
    图像获取单元,用于获取多个待归档图像;
    第一特征信息获取单元,用于对每个所述待归档图像进行目标对象的识别,获得多个所述待归档图像对应的所有目标对象的视觉特征信息;
    第二特征信息获取单元,用于从多个所述待归档图像中获取每个所述目标对象对应的辅助特征信息,所述辅助特征信息表征与对应的所述目标对象相关联的时间或空间信息;
    电子档案形成单元,用于根据每个所述目标对象对应的所述视觉特征信息和所述辅助特征信息,对所有所述目标对象进行聚类,获得多个电子档案。
  9. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在 所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至7任一项所述的方法。
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的方法。
PCT/CN2022/099852 2021-11-09 2022-06-20 电子档案生成方法、装置、终端设备及存储介质 WO2023082641A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111320826.2A CN114187463A (zh) 2021-11-09 2021-11-09 电子档案生成方法、装置、终端设备及存储介质
CN202111320826.2 2021-11-09

Publications (1)

Publication Number Publication Date
WO2023082641A1 true WO2023082641A1 (zh) 2023-05-19

Family

ID=80540831

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/099852 WO2023082641A1 (zh) 2021-11-09 2022-06-20 电子档案生成方法、装置、终端设备及存储介质

Country Status (2)

Country Link
CN (1) CN114187463A (zh)
WO (1) WO2023082641A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117765170A (zh) * 2023-12-13 2024-03-26 重庆中法供水有限公司 一种三维可视化的管理方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114187463A (zh) * 2021-11-09 2022-03-15 深圳云天励飞技术股份有限公司 电子档案生成方法、装置、终端设备及存储介质
CN114359611B (zh) * 2022-03-18 2022-09-06 浙江大华技术股份有限公司 目标聚档方法、计算机设备及存储装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163137A (zh) * 2019-05-13 2019-08-23 深圳市商汤科技有限公司 一种图像处理方法、装置和存储介质
CN110334232A (zh) * 2019-06-28 2019-10-15 深圳市商汤科技有限公司 档案应用方法及装置、存储介质
CN110334120A (zh) * 2019-06-28 2019-10-15 深圳市商汤科技有限公司 档案应用方法及装置、存储介质
CN114187463A (zh) * 2021-11-09 2022-03-15 深圳云天励飞技术股份有限公司 电子档案生成方法、装置、终端设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446797B (zh) * 2016-08-31 2019-05-07 腾讯科技(深圳)有限公司 图像聚类方法及装置
CN109710780B (zh) * 2018-12-28 2022-03-15 上海依图网络科技有限公司 一种归档方法及装置
CN111753601B (zh) * 2019-03-29 2024-04-12 华为技术有限公司 一种图像处理的方法、装置以及存储介质
CN110543583A (zh) * 2019-06-28 2019-12-06 深圳市商汤科技有限公司 信息处理方法及装置、图像设备及存储介质
CN110390031A (zh) * 2019-06-28 2019-10-29 深圳市商汤科技有限公司 信息处理方法及装置、图像设备及存储介质
CN112132175A (zh) * 2020-08-14 2020-12-25 深圳云天励飞技术股份有限公司 对象分类方法、装置、电子设备及存储介质
CN112528782B (zh) * 2020-11-30 2024-02-23 北京农业信息技术研究中心 水下鱼类目标检测方法及装置
CN113052079B (zh) * 2021-03-26 2022-01-21 重庆紫光华山智安科技有限公司 一种基于人脸聚类的区域客流统计方法、系统、设备和介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163137A (zh) * 2019-05-13 2019-08-23 深圳市商汤科技有限公司 一种图像处理方法、装置和存储介质
CN110334232A (zh) * 2019-06-28 2019-10-15 深圳市商汤科技有限公司 档案应用方法及装置、存储介质
CN110334120A (zh) * 2019-06-28 2019-10-15 深圳市商汤科技有限公司 档案应用方法及装置、存储介质
CN114187463A (zh) * 2021-11-09 2022-03-15 深圳云天励飞技术股份有限公司 电子档案生成方法、装置、终端设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117765170A (zh) * 2023-12-13 2024-03-26 重庆中法供水有限公司 一种三维可视化的管理方法及系统

Also Published As

Publication number Publication date
CN114187463A (zh) 2022-03-15

Similar Documents

Publication Publication Date Title
WO2023082641A1 (zh) 电子档案生成方法、装置、终端设备及存储介质
CN110163111B (zh) 基于人脸识别的叫号方法、装置、电子设备及存储介质
CN111738120B (zh) 人物识别方法、装置、电子设备及存储介质
Li et al. Shot boundary detection based on multilevel difference of colour histograms
CN107609105B (zh) 大数据加速结构的构建方法
US20240273134A1 (en) Image encoder training method and apparatus, device, and medium
WO2023123923A1 (zh) 人体重识别方法、人体重识别装置、计算机设备及介质
Petkos et al. Graph-based multimodal clustering for social event detection in large collections of images
WO2013075295A1 (zh) 低分辨率视频的服装识别方法及系统
CN113963303A (zh) 图像处理方法、视频识别方法、装置、设备及存储介质
CN111368867A (zh) 档案归类方法及系统、计算机可读存储介质
CN111444362B (zh) 恶意图片拦截方法、装置、设备和存储介质
CN113987243A (zh) 一种图像聚档方法、图像聚档装置和计算机可读存储介质
CN109359530B (zh) 一种智能视频监控方法及装置
CN110769259A (zh) 一种视频目标跟踪轨迹内容的图像数据压缩方法
CN113780424A (zh) 一种基于背景相似度的照片实时在线聚类方法及系统
CN113705310A (zh) 特征学习的方法、目标物体的识别方法和对应装置
CN116246086A (zh) 一种图像聚类方法、装置、电子设备及存储介质
CN111160077A (zh) 一种大规模人脸动态聚类方法
CN114882582A (zh) 基于联邦学习模式的步态识别模型训练方法与系统
CN113673550A (zh) 聚类方法、装置、电子设备、计算机可读存储介质
CN107092875B (zh) 一种新的场景识别方法
Fu et al. A Near-Duplicate Video Cleaning Method Based on AFENet Adaptive Clustering
CN112766139A (zh) 目标识别方法及装置、存储介质及电子设备
TWI767459B (zh) 資料分群方法、電子設備和儲存媒體

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22891450

Country of ref document: EP

Kind code of ref document: A1