CN116958803A

CN116958803A - Method and related device for identifying region in space

Info

Publication number: CN116958803A
Application number: CN202310144678.6A
Authority: CN
Inventors: 张帆; 黄颖菁; 郭殿升
Original assignee: Shenzhen Tencent Computer Systems Co Ltd
Current assignee: Shenzhen Tencent Computer Systems Co Ltd
Priority date: 2023-02-10
Filing date: 2023-02-10
Publication date: 2023-10-27

Abstract

The application discloses a method and a related device for identifying a region in a space, which are applied to the field of artificial intelligence, wherein a space material image of the space to be identified under an overhead view is subjected to unit division to obtain a unit to be identified in the space to be identified and a first material image of the unit to be identified; acquiring a second object image of the unit to be identified under the view angle of the object; performing key point clustering on key point information in object activity information of the unit to be identified according to a preset time period to obtain a first time sequence of the unit to be identified; respectively inputting the feature extractor features in the region identification model to extract a first depth feature, a second depth feature and a third depth feature; and the first fusion characteristic obtained through characteristic fusion is input into a classification layer region identification in the region identification model to identify the region identification result of the unit to be identified. Even if different regions with similar substance morphologies exist in the space, the situation of region identification errors can be reduced, and the high-precision requirement of region identification is met.

Description

Method and related device for identifying region in space

Technical Field

The present application relates to the field of search technologies, and in particular, to a method and an apparatus for identifying a region in a space.

Background

Along with the gradual planning of different areas in the space, timely and accurate identification of the areas in the space has important significance for space optimization and space management. Currently, with the rapid development of artificial intelligence technology, an artificial intelligence technology is generally adopted in a method for identifying a region in a space; that is, the region recognition model based on the artificial intelligence technology is applied to the spatial region recognition method.

In the related art, the spatial region identification method depends on a spatial substance image, and the specific implementation is as follows: firstly, carrying out feature extraction on an image block after the material image in the space is segmented to obtain image features, and then carrying out region identification on the image features through a region identification model to obtain a region identification result.

However, the description of the spatial region by the spatial substance image is not comprehensive enough, and when different regions with similar substance morphology exist in the space, the spatial region is identified by adopting the method only depending on the spatial substance image, so that the situation of error region identification easily occurs, and the high-precision requirement of the spatial region identification is difficult to meet.

Disclosure of Invention

In order to solve the technical problems, the application provides a method and a related device for identifying regions in a space, which can reduce the occurrence of region identification errors even when different regions with similar material morphology exist in the space, improve the identification precision of a region identification result and meet the high precision requirement of region identification in the space.

The embodiment of the application discloses the following technical scheme:

in one aspect, an embodiment of the present application provides a method for identifying a region in a space, where the method includes:

performing unit division on a space substance image of a space to be identified under an overhead view to obtain a unit to be identified in the space to be identified and a first substance image of the unit to be identified; the space to be identified is gathered with a plurality of objects;

acquiring a second object image of the unit to be identified under the object view angle;

performing key point clustering on key point information in the object activity information of the unit to be identified according to a preset time period to obtain a first time sequence of the unit to be identified;

respectively extracting the characteristics of the first substance image, the second substance image and the first time sequence through a characteristic extractor in the region identification model to obtain a first depth characteristic, a second depth characteristic and a third depth characteristic;

and carrying out region identification on the first fusion feature obtained by feature fusion of the first depth feature, the second depth feature and the third depth feature through a classification layer in the region identification model to obtain a region identification result of the unit to be identified.

In another aspect, an embodiment of the present application provides a device for identifying a region in a space, where the device includes: the device comprises a dividing unit, an acquisition unit, a clustering unit, an extraction unit and an identification unit;

the dividing unit is used for carrying out unit division on the space substance image of the space to be identified under the overlooking view to obtain a unit to be identified in the space to be identified and a first substance image of the unit to be identified; the space to be identified is gathered with a plurality of objects;

the acquisition unit is used for acquiring a second object image of the unit to be identified under the view angle of the object;

the clustering unit is used for clustering key points of the key point information in the object activity information of the unit to be identified according to a preset time period to obtain a first time sequence of the unit to be identified;

the extraction unit is used for respectively extracting the characteristics of the first substance image, the second substance image and the first time sequence through a characteristic extractor in the area identification model to obtain a first depth characteristic, a second depth characteristic and a third depth characteristic;

the identification unit is used for carrying out region identification on the first fusion feature obtained by feature fusion of the first depth feature, the second depth feature and the third depth feature through the classification layer in the region identification model, and obtaining a region identification result of the unit to be identified.

In another aspect, an embodiment of the present application provides a computer device including a processor and a memory:

the memory is used for storing a computer program and transmitting the computer program to the processor;

the processor is configured to perform the method of any of the preceding aspects according to instructions in the computer program.

In another aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program for performing the method of any one of the preceding aspects.

In another aspect, embodiments of the present application provide a computer program product comprising a computer program which, when run on a computer device, causes the computer device to perform the method of any of the preceding aspects.

According to the technical scheme, firstly, space substance images of a space to be identified, in which a plurality of objects are gathered, under an overhead view are subjected to unit division, and a unit to be identified in the space to be identified and a first substance image of the unit to be identified are obtained; acquiring a second object image of the unit to be identified under the view angle of the object; performing key point clustering on key point information in object activity information of the unit to be identified according to a preset time period to obtain a first time sequence of the unit to be identified; on the basis that the first substance image describes the substance information of the unit to be identified from an overhead view, and the second substance image additionally describes the substance information of the unit to be identified from an object view, a first time sequence is added to further describe the object activity rule of the unit to be identified so as to further reflect the regional characteristics of the region to which the unit to be identified belongs. Then, respectively inputting the first substance image, the second substance image and the first time sequence into a feature extractor in the region identification model to perform feature extraction to obtain a first depth feature, a second depth feature and a third depth feature; inputting the first fused features obtained by feature fusion of the first depth features, the second depth features and the third depth features into a classification layer in a region identification model for region identification to obtain a region identification result of a unit to be identified; the method comprises the steps that the first fusion characteristic is fused with the material characteristic of the unit to be identified under the overlooking view, the material characteristic of the unit to be identified under the object view and the object activity characteristic representing the regional characteristic of the region where the unit to be identified belongs, and on the basis that the regional characteristic description of the unit to be identified is more comprehensive, the regional identification based on the first fusion characteristic enables the regional identification result to be more accurate. According to the method, even if different regions with similar material morphology exist in the space, the situation of region identification errors can be reduced, and the identification precision of the region identification result is improved, so that the high precision requirement of region identification in the space is met.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive faculty for a person skilled in the art.

Fig. 1 is a schematic diagram of a system architecture of a method for identifying a region in space according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for identifying a region in space according to an embodiment of the present application;

fig. 3 is a schematic diagram of a flow stage of a method for identifying a region in a space according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a model structure of a region identification model according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a portion of a sample cell in a sample space according to an embodiment of the present application;

FIG. 6 is a schematic diagram of confidence that an area recognition model recognizes that an area to which a sample unit belongs is correct according to an embodiment of the present application;

FIG. 7 is a schematic diagram of confidence distribution of a region identification model for identifying a correct region of a sample unit according to an embodiment of the present application;

FIG. 8 is a flowchart of a method for identifying a target function area in a city space according to an embodiment of the present application;

fig. 9 is a block diagram of a spatial region identification device according to an embodiment of the present application;

fig. 10 is a block diagram of a server according to an embodiment of the present application;

fig. 11 is a block diagram of a terminal according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings.

At present, along with the gradual planning of different areas in the space, timely and accurately identifying the areas in the space has important significance for space optimization and space management. For example, with the rapid development of the economic society and the planning requirement of the urban space, the range of the urban space is gradually enlarged, different functional areas in the urban space are mutually bordered and fused, and the boundary between the different functional areas is gradually blurred and disappeared, so that a special phenomenon of co-fusion coexistence of the different functional areas is formed. For example, the target functional area in the urban space is a non-regular living area in the urban space, and has the regional characteristics of low space quality, disordered land utilization condition, high building density, lack of infrastructure, poor environmental quality and the like, so that the urban space is negatively affected to a certain extent; based on the method, the target functional area in the urban space is timely and accurately identified, and the method is very important for urban space optimization and urban space management.

In the related art, an area recognition model based on an artificial intelligence technology is applied to a spatial area recognition method, the spatial area recognition method depends on a spatial substance image, and the specific implementation mode is as follows: firstly, carrying out feature extraction on an image block after the material image in the space is segmented to obtain image features, and then carrying out region identification on the image features through a region identification model to obtain a region identification result. For example, feature extraction is performed on an image block after the substance image in the urban space is segmented to obtain image features, and then target functional area recognition is performed on the image features through a target functional area recognition model to obtain a target functional area recognition result.

However, through researches, the description of the spatial region by the spatial substance image is not comprehensive enough, and when different regions with similar substance morphology exist in the space, the spatial region is identified by adopting the mode only depending on the spatial substance image, so that the situation of wrong region identification easily occurs, and the high-precision requirement of the spatial region identification is difficult to meet. For example, when there are a target functional area and an old living area which are similar in material morphology in the urban space, the target functional area is identified by using the above method only depending on the material image of the urban space, and thus, the situation that the target functional area is identified incorrectly easily occurs, and it is difficult to satisfy the high-precision requirement of the target functional area identification in the urban space.

In order to solve the technical problems described above, an embodiment of the present application provides a method for identifying a region in a space, wherein a spatial material image of a space to be identified, in which a plurality of objects are gathered, under an overhead view is subjected to unit division, so as to obtain a unit to be identified in the space to be identified and a first material image of the unit to be identified; acquiring a second object image of the unit to be identified under the view angle of the object; performing key point clustering on key point information in object activity information of the unit to be identified according to a preset time period to obtain a first time sequence of the unit to be identified; on the basis that the first substance image describes the substance information of the unit to be identified from an overhead view, and the second substance image additionally describes the substance information of the unit to be identified from an object view, a first time sequence is added to further describe the object activity rule of the unit to be identified so as to further reflect the regional characteristics of the region to which the unit to be identified belongs. Respectively inputting the first substance image, the second substance image and the first time sequence into a feature extractor in the region identification model to perform feature extraction to obtain a first depth feature, a second depth feature and a third depth feature; inputting the first fused features obtained by feature fusion of the first depth features, the second depth features and the third depth features into a classification layer in a region identification model for region identification to obtain a region identification result of a unit to be identified; the method comprises the steps that the first fusion characteristic is fused with the material characteristic of the unit to be identified under the overlooking view, the material characteristic of the unit to be identified under the object view and the object activity characteristic representing the regional characteristic of the region where the unit to be identified belongs, and on the basis that the regional characteristic description of the unit to be identified is more comprehensive, the regional identification based on the first fusion characteristic enables the regional identification result to be more accurate. According to the method, even if different regions with similar material morphology exist in the space, the situation of region identification errors can be reduced, and the identification precision of the region identification result is improved, so that the high precision requirement of region identification in the space is met.

Next, a system architecture of the spatial region recognition method will be described. Referring to fig. 1, fig. 1 is a schematic diagram of a system architecture of a method for identifying a region in a space according to an embodiment of the present application, where the system architecture includes a terminal 101 and a server 102.

The terminal 101 includes, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, an intelligent voice interaction device, an intelligent home appliance, a vehicle-mounted terminal, and the like. The server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The terminal 101 and the server 102 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein. For example, the terminal 101 and the server 102 may be connected through a network, which may be a wired or wireless network.

The terminal 101 collects a space substance image of a space to be identified in an overhead view, and sends the space substance image to the server 102; the server 102 performs unit division on the space material image of the space to be identified under the overhead view to obtain a unit to be identified in the space to be identified and a first material image of the unit to be identified; the space to be identified is aggregated with a plurality of objects. As an example, the space to be identified is "city a", the spatial matter image of the city a "under the overhead view is" remote sensing image a ", the object is" vehicle ", and the terminal 101 may collect the" remote sensing image a "of the city a" in which a plurality of "vehicles" are gathered under the overhead view and transmit the "remote sensing image a" to the server 102; the server 102 may divide the "remote sensing image a" into units, so as to obtain that the units to be identified in the "city a" are "unit 1" and the first substance image of the "unit 1" is "remote sensing image block 1".

The server 102 acquires a second object image of the unit to be identified at the object perspective. As an example, based on the above example, the server 102 may further acquire "unit 1" as the second object image under the "vehicle" perspective as "street view image 21", "street view image 22", … …, and "street view image 2m", where m is a positive integer, and m > 2.

The server 102 performs key point clustering on key point information in object activity information of the unit to be identified according to a preset time period, and obtains a first time sequence of the unit to be identified. As an example, the preset time period is "every 1 hour", the object activity information is "vehicle travel information", the key point information is "travel start point information" and "travel end point information", the key point cluster is "travel start point cluster" and "travel end point cluster", on the basis of the above example, the server 102 may further perform "travel start point cluster" and "travel end point cluster" on "travel start point information" and "travel end point information" in "vehicle travel information" of "unit 1" according to "every 1 hour" respectively, to obtain "travel start point time sequence 1" and "travel end point time sequence 1" of "unit 1", and splice "travel start point time sequence 1" and "travel end point time sequence 1" of "unit 1", to obtain the first time sequence of "unit 1" as "time sequence 1".

The server 102 performs feature extraction on the first substance image, the second substance image, and the first time series, respectively, by a feature extractor in the region identification model, to obtain a first depth feature, a second depth feature, and a third depth feature. As an example, the region recognition model is a "target functional region recognition model", and based on the above example, the server 102 may input the "remote sensing image block 1", "street view image 21", "street view image 22", … … "," street view image 2m ", and" time series 1 "into the feature extractor in the" target functional region recognition model "to perform feature extraction, so as to obtain a first depth feature as" depth feature 1", a second depth feature as" depth feature 2", and a third depth feature as" depth feature 3", respectively.

And the server 102 performs region recognition on the first fusion feature obtained by feature fusion of the first depth feature, the second depth feature and the third depth feature through a classification layer in the region recognition model, so as to obtain a region recognition result of the unit to be recognized. As an example, based on the above example, the server 102 may perform feature fusion on the "depth feature 1", "depth feature 2", and "depth feature 3", to obtain a first fusion feature "fusion feature 1", and input the "fusion feature 1" into the classification layer in the "target function area identification model" to perform target function area identification, to obtain the target function area identification result of the "unit 1".

That is, the space substance image of the space to be identified, in which a plurality of objects are gathered, under the overhead view is subjected to unit division, and a unit to be identified and a first substance image of the unit to be identified in the space to be identified are obtained; acquiring a second object image of the unit to be identified under the view angle of the object; performing key point clustering on key point information in object activity information of the unit to be identified according to a preset time period to obtain a first time sequence of the unit to be identified; on the basis that the first substance image describes the substance information of the unit to be identified from an overhead view, and the second substance image additionally describes the substance information of the unit to be identified from an object view, a first time sequence is added to further describe the object activity rule of the unit to be identified so as to further reflect the regional characteristics of the region to which the unit to be identified belongs. Respectively inputting the first substance image, the second substance image and the first time sequence into a feature extractor in the region identification model to perform feature extraction to obtain a first depth feature, a second depth feature and a third depth feature; inputting the first fused features obtained by feature fusion of the first depth features, the second depth features and the third depth features into a classification layer in a region identification model for region identification to obtain a region identification result of a unit to be identified; the method comprises the steps that the first fusion characteristic is fused with the material characteristic of the unit to be identified under the overlooking view, the material characteristic of the unit to be identified under the object view and the object activity characteristic representing the regional characteristic of the region where the unit to be identified belongs, and on the basis that the regional characteristic description of the unit to be identified is more comprehensive, the regional identification based on the first fusion characteristic enables the regional identification result to be more accurate. According to the method, even if different regions with similar material morphology exist in the space, the situation of region identification errors can be reduced, and the identification precision of the region identification result is improved, so that the high precision requirement of region identification in the space is met.

Based on the above example, on the basis that the "remote sensing image a" describes the substance information of the "unit 1" in the "city a" from the overhead view, and the "street view image 21", "street view image 22", … … and "street view image 2m" supplement the substance information describing the "unit 1" from the "vehicle" view, the "vehicle" activity law in which the "time series 1" further describes the "unit 1" is added to further embody the regional characteristics of the region to which the "unit 1" belongs; the characteristic extraction and characteristic fusion process has the advantages that the 'fusion characteristic 1' is fused with the material characteristic of the 'unit 1' in the overlooking view, the material characteristic of the 'unit 1' in the 'vehicle' view and the 'vehicle travel characteristic' reflecting the regional characteristics of the region to which the 'unit 1' belongs, and the target function region recognition result of the 'unit 1' is more accurate based on the region recognition of the 'fusion characteristic 1' on the basis that the regional characteristic description of the 'unit 1' is more comprehensive. According to the method, even if the target functional area and the old living area which are similar in material morphology exist in the city A, the situation of error recognition of the target functional area can be reduced, and the recognition precision of the recognition result of the target functional area is improved, so that the high-precision requirement of recognition of the target functional area in the city space is met.

The application provides a method for identifying a region in a space, which relates to an artificial intelligence (Artificial Intelligence, AI) technology. Wherein artificial intelligence is the intelligence of simulating, extending and expanding a person using a digital computer or a machine controlled by a digital computer, sensing the environment, obtaining knowledge, and using knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Basic technologies for artificial intelligence generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, electromechanical integration, and the like.

The application provides a method for identifying a region in a space, which mainly relates to large directions such as Computer Vision (CV) technology, natural language processing (Nature Language processing, NLP) technology, machine Learning (ML)/deep Learning and the like, wherein the large directions comprise an artificial intelligence technology. The computer vision is a science for researching how to make a machine "see", and more specifically, a camera and a computer are used to replace human eyes to perform machine vision such as recognition, follow-up and measurement on a target, and further perform graphic processing, so that the computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, and image semantic understanding techniques.

Natural language processing is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing and text semantic understanding techniques.

Machine learning is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, and teaching learning.

In the specific embodiment of the present application, if related data such as object activity information relates to user information, when the above embodiments of the present application are applied to specific products or technologies, individual permissions or individual agreements of users need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.

In the embodiment of the present application, the computer device may be a server or a terminal, and the method provided in the embodiment of the present application may be executed by the terminal or the server alone or in combination with the terminal and the server. The embodiment corresponding to fig. 1 is mainly described by taking a method provided by the embodiment of the application executed by a server as an example.

In addition, when the method provided by the embodiment of the present application is separately executed by the terminal, the execution method is similar to the embodiment corresponding to fig. 1, and mainly the server is replaced by the terminal. In addition, when the method provided by the embodiment of the application is cooperatively executed by the terminal and the server, the steps required to be embodied on the front-end interface can be executed by the terminal, and some steps required to be calculated in the background and not required to be embodied on the front-end interface can be executed by the server.

Next, a method for identifying a region in space provided by the embodiment of the present application will be described in detail by taking a method provided by the embodiment of the present application performed by a server as an example with reference to the accompanying drawings. Referring to fig. 2, fig. 2 is a flowchart of a method for identifying a region in a space, where the method includes:

s201: performing unit division on a space substance image of a space to be identified under an overhead view to obtain a unit to be identified and a first substance image of the unit to be identified in the space to be identified; the space to be identified is aggregated with a plurality of objects.

In the embodiment of the application, a space needing to be subjected to region identification is taken as a space to be identified, a plurality of objects are gathered in the space to be identified, and in order to accurately identify the region in the space to be identified in time, a substance image for describing substance information of the space to be identified is required to be acquired; considering that substance information of a space to be identified is described from different view angles, substance information of the space to be identified can be more comprehensively described, and the space substance information describing the space to be identified from an overhead view angle has the advantages of wide view angle range, rich substance information, convenient acquisition mode and the like, then a space substance image for describing the space substance information of the space to be identified from the overhead view angle, that is, a space substance image of the space to be identified under the overhead view angle, is required to be acquired first. In practical applications, the spatial material image of the space to be identified in the overhead view may be, for example, a remote sensing image, or an unmanned aerial vehicle image.

On the basis, considering the distribution situation of different areas in the space to be identified, the space to be identified is not directly identified in the follow-up process, and the space material images are required to be subjected to unit division to obtain a unit to be identified in the space to be identified and a first material image of the unit to be identified, so that the area to which the unit to be identified belongs can be accurately identified in the follow-up process in time; the to-be-identified unit may be, for example, a to-be-identified grid in a to-be-identified space obtained by meshing a space substance image, where the first substance image is used for describing substance information of the to-be-identified unit from an overhead view.

As an example, when the method for identifying a region in space is applied to a target functional region identification scene in urban space, the space to be identified is "city a", the object is "vehicle", and the space material image of "city a" in which a plurality of "vehicles" are gathered under the overhead view is "remote sensing image a"; the server divides the units of the remote sensing image A of the city A under the overhead view to obtain the units to be identified in the city A as the unit 1 and the first substance image of the unit 1 as the remote sensing image block 1.

S202: and acquiring a second object image of the unit to be identified under the view angle of the object.

In the embodiment of the present application, after S201 is performed to obtain a unit to be identified in a space to be identified and a first substance image of the unit to be identified, since the first substance image is used to describe substance information of the unit to be identified from an overhead view, in order to describe substance information of the unit to be identified more fully, in consideration of the advantages that local substance information describing the unit to be identified from an object view has substance information details, expresses object visual environment and environmental back information, and the like, it is also necessary to obtain a second substance image used to describe local substance information of the unit to be identified from an object view; that is, a second object image of the unit to be identified at the object angle of view is acquired. In practical applications, the second object image of the unit to be identified under the object view angle may be, for example, a street view image, or may be a social media image.

In the specific implementation of S202, a plurality of partial substance images for describing partial substance information of the space to be identified from the object view angle, that is, a plurality of partial substance images of the space to be identified under the object view angle may be acquired first; and then carrying out unit connection on the plurality of local substance images according to the unit to be identified, determining whether the plurality of local substance images have a connection relation with the unit to be identified, and taking the local substance image which has the connection relation with the unit to be identified as a second substance image of the unit to be identified under the object view angle. Thus, the present application provides a possible implementation manner, and S202 may be, for example: and performing unit connection on a plurality of partial substance images of the space to be identified under the object view angle according to the unit to be identified, and acquiring a second substance image.

The plurality of partial substance images of the space to be identified under the object view angle are acquired by sampling points of the space to be identified under the object view angle, and the sampling points are unevenly distributed in the space to be identified, so that the number of the partial substance images with the connection relation with the units to be identified is variable, that is, the number of the partial substance images with the connection relation with different units to be identified may be the same or different.

As an example, on the basis of the above example, the plurality of partial substance images of "city a" under the "vehicle" view angle are "street view image 21", "street view image 22", … …, and "street view image 2n", where n is a positive integer, n > 3; the server performs unit connection on the street view image 21, the street view image 22, the … … and the street view image 2n according to the unit 1 to obtain second substance images of the unit 1 under the view angle of the vehicle, namely the street view image 21, the street view image 22, the … … and the street view image 2m, wherein m is a positive integer, and m is more than 2 and less than n.

Of course, a plurality of partial substance images of the unit to be identified under the object view angle may also be directly acquired as the second substance image of the unit to be identified under the object view angle. The plurality of partial substance images of the unit to be identified under the object view angle are acquired by sampling points of the unit to be identified under the object view angle. Thus, the present application provides a possible implementation manner, and S202 may be, for example: and directly acquiring a plurality of local substance images of the unit to be identified under the view angle of the object, and determining the local substance images as second substance images.

As another example, on the basis of the above example, the server directly acquires that a plurality of partial substance images of "unit 1" under the "vehicle" view angle are "street view image 21", "street view image 22", … …, and "street view image 2m", that is, that a second substance image of "unit 1" under the "vehicle" view angle is "street view image 21", "street view image 22", … …, and "street view image 2m"; wherein m is a positive integer, and m is more than 2.

S203: and clustering key points according to the key point information in the object activity information of the unit to be identified in a preset time period, and obtaining a first time sequence of the unit to be identified.

Because the description of the spatial material image on the spatial region is not comprehensive enough, when different regions with similar material morphology exist in the space, the spatial region is identified by adopting the spatial region identification method in the related technology only depending on the spatial material image, the situation of wrong region identification easily occurs, and the high-precision requirement of spatial region identification is difficult to meet.

Therefore, in the embodiment of the application, in order to more fully describe the region information of the region to which the unit to be identified belongs, in consideration of the aggregation situation of the key point information in the object activity information of the unit to be identified in the preset time period, the object activity rule of the unit to be identified can be dynamically described so as to further reflect the region characteristics of the region to which the unit to be identified belongs, the object activity information for dynamically describing the object activity rule of the unit to be identified also needs to be acquired, and the key point clustering is performed on the key point information in the object activity information of the unit to be identified according to the preset time period, so that the first time sequence of the unit to be identified is obtained, and the region to which the unit to be identified belongs is more accurately identified based on the first time sequence of the unit to be identified. The first time sequence is used for dynamically describing the object activity rule of the unit to be identified so as to further reflect the regional characteristics of the region to which the unit to be identified belongs.

In practical application, the object activity information of the unit to be identified may be vehicle trip information of the unit to be identified, or may be vehicle check-in information or vehicle communication information.

In the implementation of S203, any one of the following two implementations may be adopted:

the specific implementation manner of S203 refers to: on the basis of acquiring object activity information for dynamically describing the object activity rule of the space to be identified, namely, acquiring the object activity information of the space to be identified, the key point information of the space to be identified can be acquired through key point extraction, then the key point information of the unit to be identified is acquired through unit connection, and finally the first time sequence of the unit to be identified is acquired through key point clustering. When the method is specifically implemented, firstly, key point extraction is carried out on object activity information of a space to be identified, so as to obtain key point information of the space to be identified; then, carrying out unit connection according to the key point information of the space to be identified according to the unit to be identified, determining whether the key point information of the space to be identified has a connection relation with the unit to be identified, and taking the key point information with the connection relation with the unit to be identified as the key point information of the unit to be identified; and finally, carrying out key point clustering on the key point information of the unit to be identified according to a preset time period to obtain a first time sequence of the unit to be identified. Thus, the present application provides one possible implementation, S203 may include, for example, the following S2031-S2033:

S2031: and extracting key points from the object activity information of the space to be identified to obtain the key point information of the space to be identified.

S2032: and carrying out unit connection according to the key point information of the space to be identified by the unit to be identified, and obtaining the key point information of the unit to be identified.

S2033: and clustering key points according to the key point information of the unit to be identified in a preset time period to obtain a first time sequence.

In the implementation manner of S2031-S2033, on the basis of the object activity information of the space to be identified, key point information of the space to be identified is obtained by extracting key points, and then key point information of the unit to be identified is obtained by connecting units; the information data quantity of the key point information of the space to be identified is far smaller than that of the object activity information of the space to be identified, so that the subsequent unit connection processing process is simpler and more convenient.

As an example, the preset time period is "every 1 hour", the object activity information is "vehicle travel information", the key point extraction is "travel start point extraction" and "travel end point extraction", the key point clustering is "travel start point clustering" and "travel end point clustering", on the basis of the above example, the server performs "travel start point extraction" and "travel end point extraction" on "vehicle travel information" of "city a" respectively, and obtains "travel start point information" and "travel end point information" of "city a"; respectively performing unit connection on the trip starting point information and the trip end point information of the city A according to the unit 1 to obtain trip starting point information and trip end point information of the unit 1; the method comprises the steps of carrying out travel starting point clustering and travel end point clustering on travel starting point information and travel end point information of a unit 1 respectively according to the travel starting point information and the travel end point information of the unit 1 every 1 hour to obtain a travel starting point time sequence 1 and a travel end point time sequence 1 of the unit 1, and splicing the travel starting point time sequence 1 and the travel end point time sequence 1 of the unit 1 to obtain a first time sequence 1 of the unit 1.

Another specific implementation manner of S203 refers to: on the basis of acquiring object activity information for dynamically describing the object activity rule of the space to be identified, namely, acquiring the object activity information of the space to be identified, the object activity information of the unit to be identified can be obtained through unit connection, then the key point information of the unit to be identified is obtained through key point extraction, and finally the first time sequence of the unit to be identified is obtained through key point clustering. When the method is specifically implemented, firstly, unit connection is carried out on object activity information of a space to be identified according to the unit to be identified, whether the object activity information of the space to be identified has a connection relation with the unit to be identified or not is determined, and the object activity information with the connection relation with the unit to be identified is used as the object activity information of the unit to be identified; then, extracting key points from the object activity information of the unit to be identified to obtain the key point information of the unit to be identified; and finally, carrying out key point clustering on the key point information of the unit to be identified according to a preset time period to obtain a first time sequence of the unit to be identified. Thus, the present application provides one possible implementation, S203 may include, for example, the following S2034-S2036:

S2034: and performing unit connection according to the object activity information of the space to be identified by the unit to be identified, and acquiring the object activity information of the unit to be identified.

S2035: and extracting key points from the object activity information of the unit to be identified to obtain the key point information of the unit to be identified.

S2036: and clustering key points according to the key point information of the unit to be identified in a preset time period to obtain a first time sequence.

In the implementation manner of S2034-S2036, on the basis of the object activity information of the space to be identified, the object activity information of the unit to be identified is obtained through unit connection, and then the key point information of the unit to be identified is obtained through key point extraction; the information data quantity of the object activity information of the unit to be identified is far smaller than that of the object activity information of the space to be identified, so that the subsequent key point extraction processing process is simpler and more convenient.

As an example, on the basis of the above example, the server performs unit connection on the "vehicle travel information" of the "city a" according to the "unit 1" to obtain the "vehicle travel information" of the "unit 1"; the method comprises the steps of performing travel starting point extraction and travel end point extraction on vehicle travel information of a unit 1 respectively to obtain travel starting point information and travel end point information of the unit 1; the method comprises the steps of carrying out travel starting point clustering and travel end point clustering on travel starting point information and travel end point information of a unit 1 respectively according to the travel starting point information and the travel end point information of the unit 1 every 1 hour to obtain a travel starting point time sequence 1 and a travel end point time sequence 1 of the unit 1, and splicing the travel starting point time sequence 1 and the travel end point time sequence 1 of the unit 1 to obtain a first time sequence 1 of the unit 1.

In addition, in the embodiment of the present application, in order to reduce an adverse effect generated by performing subsequent region identification when a singular value exists in the first time sequence of the unit to be identified, normalization processing may be performed on the first time sequence of the unit to be identified to obtain a normalized first time sequence, so as to update the first time sequence of the unit to be identified, so that the possibility of existence of the singular value in the updated first time sequence is reduced, thereby reducing the foregoing adverse effect. Thus, the present application provides one possible implementation, the method may for example further comprise the following S1-S2:

s1: and carrying out normalization processing on the first time sequence to obtain a normalized first time sequence.

S2: and updating the first time sequence according to the normalized first time sequence.

In addition, in the embodiment of the present application, the execution order of S202 and S203 is not limited, and S202 and S203 may be executed first, S202 may be executed second, or S202 and S203 may be executed simultaneously.

S204: and respectively extracting the characteristics of the first substance image, the second substance image and the first time sequence by a characteristic extractor in the region identification model to obtain a first depth characteristic, a second depth characteristic and a third depth characteristic.

In the embodiment of the present application, after S201-S203 are executed to obtain a first substance image, a second substance image, and a first time sequence of a unit to be identified in a space to be identified, since the first substance image is used to describe substance information of the unit to be identified from an overhead view, the second substance image is used to describe substance information of the unit to be identified from an object view, and the first time sequence is used to dynamically describe an object activity rule of the unit to be identified, so as to further embody a region characteristic of a region to which the unit to be identified belongs, and the first substance image, the second substance image, and the first time sequence are combined, so that region information of the region to which the unit to be identified belongs can be more comprehensively described; the first substance image, the second substance image and the first time sequence are respectively input into a feature extractor in the region identification model to perform feature extraction, so that a first depth feature corresponding to the first substance image, a second depth feature corresponding to the second substance image and a third depth feature corresponding to the first time sequence are obtained, and the region to which the unit to be identified belongs can be accurately identified based on the first depth feature, the second depth feature and the third depth feature.

The first depth feature represents a material feature of the unit to be identified under an overhead view, the second depth feature represents a material feature of the unit to be identified under an object view, and the third depth feature represents an object activity feature representing a region characteristic of a region to which the unit to be identified belongs.

As an example, the region recognition model is a "target functional region recognition model", and on the basis of the above example, the server inputs the "remote sensing image block 1", "street view image 21", "street view image 22", … … "," street view image 2m ", and" time series 1 "into the feature extractor in the" target functional region recognition model "respectively to perform feature extraction, so as to obtain a first depth feature of" depth feature 1", a second depth feature of" depth feature 2", and a third depth feature of" depth feature 3".

In the specific implementation of S204, since the first substance image, the second substance image and the first time sequence of the unit to be identified in the space to be identified are input data of three different data sources, based on this, the feature extractor in the area identification model is built with three different types of first convolutional neural networks, convolutional-cyclic neural networks and cyclic-full convolutional neural networks, which are used for respectively extracting features of the input data of the three different data sources, so as to obtain the first depth feature, the second depth feature and the third depth feature.

The first convolution neural network is used for extracting features of the first substance image, and a high-dimensional feature vector is obtained to serve as a first depth feature; the convolution-circulation neural network is used for extracting features of the second object image to obtain a high-dimensional feature vector as a second depth feature; the cyclic-full convolution neural network is used for extracting features of the first time sequence, and a high-dimensional feature vector is obtained to serve as a third depth feature. Thus, the present application provides one possible implementation, the feature extractor in the region identification model includes a first convolutional neural network, a convolutional-cyclic neural network, and a cyclic-full convolutional neural network; s204 may include, for example, S2041-S2043 as follows:

S2041: and extracting the characteristics of the first substance image through a first convolutional neural network in the region identification model to obtain a first depth characteristic.

As an example, considering the limitations on data volume and computational resources, the first convolutional neural network may be, for example, the most classical, less complex 18-layer residual network in the computer vision field (res net 18), and the network parameters of the res net18 are trained and fine-tuned based on a large visual database ImageNet for visual object recognition software research. In addition, the first convolutional neural network may be other convolutional neural networks, for example, in the case of sufficient data amount and computing resources, the first convolutional neural network may be a residual network of 101 layers (res net 101) or the like.

S2042: and extracting the characteristics of the second object image through a convolution-circulation neural network in the region identification model to obtain a second depth characteristic.

In the specific implementation of S2042, since the number of second object images of the units to be identified in the space to be identified is not constant, in order to deal with the problem of the number being not constant at the time of feature extraction, a convolutional-cyclic neural network in which a second convolutional neural network is combined with a first cyclic neural network is employed. Based on the above, each image in the second object image is set as a time step, in each time step, the second convolutional neural network is used for extracting features of the image corresponding to the time step in the second object image, so as to obtain a high-dimensional feature vector as a first convolutional feature corresponding to the time step, the first convolutional neural network is used for calculating the first convolutional feature to obtain an output feature corresponding to the time step, and finally, the output feature obtained by the last time step through the second convolutional neural network and the first convolutional neural network is used as a second depth feature. Thus, the present application provides one possible implementation, the convolutional-recurrent neural network comprising a second convolutional neural network and a first recurrent neural network; s2042 may include, for example, S3-S4 as follows:

S3: and extracting the characteristics of the second object image through a second convolution neural network in the region identification model to obtain a first convolution characteristic.

S4: and calculating the first convolution characteristic through a first cyclic neural network in the region identification model to obtain a second depth characteristic.

As an example, the convolutional-recurrent neural network may be, for example, a Vision-long and short-term memory network (Vision-Long Short Term Memory, vision-LSTM); the second convolutional neural network may be, for example, the most classical, less complex ResNet18 in the computer vision field, the network parameters of the ResNet18 being trained and fine-tuned based on the dataset Places365 containing 1800000 images of the 365 landscape classifications; in addition, the second convolutional neural network may also be other convolutional neural networks, for example, in the case of sufficient data volume and computing resources, the second convolutional neural network may be res net101 or the like; the first recurrent neural network may be, for example, a long short term memory network (Long Short Term Memory, LSTM), and may also be other recurrent neural networks, such as a gated recurrent unit (Gated Recurrent Unit, GRU) or a bi-directional recurrent neural network (Recurrent Neural Network, RNN), etc.

S2043: and extracting the characteristics of the first time sequence through a cyclic-full convolution neural network in the region identification model to obtain a third depth characteristic.

In the specific implementation of S2043, the cyclic-full convolutional neural network is obtained by combining a second cyclic neural network and a full convolutional neural network, where the second cyclic neural network is used for extracting features of the first time sequence, and a feature vector with high dimensionality is obtained as a second convolutional feature; the full convolution neural network is used for extracting the characteristics of the first time sequence to obtain a high-dimensional characteristic vector as a time characteristic; based on this, the second convolution feature and the temporal feature are fused to obtain a third depth feature. Thus, the present application provides one possible implementation, the cyclic-full convolutional neural network comprising a second cyclic neural network and a full convolutional neural network; s2043 may include, for example, S5-S7 as follows:

s5: and extracting the characteristics of the first time sequence through a second cyclic neural network in the region identification model to obtain second convolution characteristics.

S6: and extracting the characteristics of the first time sequence through the full convolution neural network in the area identification model to obtain the time characteristics.

S7: and carrying out fusion processing on the second convolution characteristic and the time characteristic to obtain a third depth characteristic.

The fusion process may be, for example, a splicing process, and of course, in the embodiment of the present application, a specific implementation manner of the fusion process is not limited, so long as fusion of the second convolution feature and the time feature is achieved.

As an example, the convolutional-recurrent neural network may be, for example, a long-short term memory-full convolutional neural network (Long Short Term Memory-Fully Convolutional Networks, LSTM-FCN); the second recurrent neural network may be, for example, an LSTM, and furthermore, the second recurrent neural network may also be other recurrent neural networks, such as a GRU or bidirectional RNN, etc.; all layers in the fully convolutional neural network (Fully Convolutional Networks, FCN) are convolutional layers.

S205: and carrying out region identification on the first fusion feature obtained by carrying out feature fusion on the first depth feature, the second depth feature and the third depth feature through a classification layer in the region identification model, and obtaining a region identification result of the unit to be identified.

In the embodiment of the present application, after the step S204 of obtaining the first depth feature, the second depth feature and the third depth feature, the first depth feature represents the material feature of the unit to be identified under the overhead view, the second depth feature represents the material feature of the unit to be identified under the object view, the third depth feature represents the object activity feature representing the region characteristic of the region to which the unit to be identified belongs, and the first depth feature, the second depth feature and the third depth feature are fused through the features to obtain a first fusion feature fusing the material feature of the unit to be identified under the overhead view, the material feature of the unit to be identified under the object view and the object activity feature representing the region characteristic of the region to which the unit to be identified belongs; and inputting the first fusion characteristic into a classification layer in the region identification model to carry out region identification, namely predicting the region to which the unit to be identified belongs as a region identification result of the unit to be identified.

As an example, based on the above example, the server performs fusion processing on the "depth feature 1", "depth feature 2" and "depth feature 3" to obtain a first fusion feature "fusion feature 1", and inputs the "fusion feature 1" into the classification layer in the "target function area identification model" to perform target function area identification, so as to obtain the target function area identification result of the "unit 1".

On the basis of the descriptions of S201 to S205, referring to fig. 3, fig. 3 is a schematic diagram illustrating a flow stage of a spatial region identification method according to an embodiment of the present application; wherein, the steps S201-S205 can be divided into two flow stages; the first process stage may be that a spatial substance image of a space to be identified under an overhead view, a plurality of local substance images of the space to be identified under an object view, and object activity information of the space to be identified are respectively subjected to data preprocessing to obtain a first substance image of a unit to be identified in the space to be identified, a second substance image of the unit to be identified, and a first time sequence of the unit to be identified; the second process stage may be to input the first substance image, the second substance image, and the first time sequence into the region identification model for region identification, so as to obtain a region identification result of the unit to be identified.

The first process stage specifically refers to: performing unit division on a space substance image of a space to be identified under an overhead view to obtain a unit to be identified and a first substance image of the unit to be identified in the space to be identified; performing unit connection on a plurality of partial substance images of a space to be identified under an object view angle according to the unit to be identified, and acquiring a second substance image of the unit to be identified; extracting key points of the object activity information of the space to be identified, obtaining key point information of the space to be identified, performing unit connection according to the key point information of the space to be identified, obtaining the key point information of the unit to be identified, and performing key point clustering on the key point information of the unit to be identified according to a preset time period to obtain a first time sequence.

Referring to fig. 4, fig. 4 is a schematic diagram of a model structure of a region identification model according to an embodiment of the present application; wherein the region identification model comprises a feature extractor and a classification layer, the feature extractor comprises a first convolutional neural network, a convolutional-convolutional neural network and a convolutional-fully convolutional neural network, and based on this, the second process stage refers to: inputting a first substance image of a unit to be identified into a first convolutional neural network for feature extraction to obtain a first depth feature, inputting a second substance image of the unit to be identified into a convolutional-convolutional neural network for feature extraction to obtain a second depth feature, and inputting a second time sequence of the unit to be identified into the convolutional-convolutional neural network for feature extraction to obtain a third depth feature; and inputting the first fusion feature obtained by carrying out feature fusion on the first depth feature, the second depth feature and the third depth feature into a classification layer for region identification, and obtaining a region identification result of the unit to be identified.

In addition, in the embodiment of the present application, the region identification model is obtained by training a preset identification model according to a third material image of the sample unit in the sample space under the overhead view, a fourth material image of the sample unit under the object view, object activity information of the sample unit, and region label data identifying a region to which the sample unit belongs. The specific training process is as follows: firstly, a training sample for training a preset recognition model to obtain a region recognition model is required to be obtained, and space material images of a sample space under an overhead view are subjected to unit division to obtain a sample unit and a third material image of the sample unit in the sample space; acquiring a fourth substance image of the sample unit under the object view angle; performing key point clustering on key point information in object activity information of the sample unit according to a preset time period to obtain a second time sequence of the sample unit; based on this, the third substance image of the sample cell, the fourth substance image of the sample cell, the second time series of sample cells constitute a training sample. The sample unit has region tag data identifying a region to which the sample unit belongs.

Then, considering a step-by-step recognition mode of firstly extracting features and then recognizing areas based on a preset recognition model on the basis of the third substance image, the fourth substance image and the second time sequence, when model parameters of the preset recognition model are trained subsequently, the parameters of the feature extraction process are difficult to train, so that the problem that feature extraction is not accurate enough occurs in the step-by-step recognition mode. Based on this, in order to solve the problem, a feature extraction process and a region recognition process are fused in a preset recognition model, that is, the preset recognition model includes a feature extractor and a classification layer, and a third substance image, a fourth substance image and a second time sequence are input into the feature extractor in the preset recognition model respectively to perform feature extraction, so as to obtain a fourth depth feature representing a substance feature of a sample unit under an overhead view, a fifth depth feature representing a substance feature of the sample unit under an object view, and a sixth depth feature representing a region characteristic of a region to which the sample unit belongs; and inputting a second fusion feature obtained by feature fusion of the fourth depth feature, the fifth depth feature and the sixth depth feature into a classification layer in a preset recognition model to perform region recognition, so as to obtain a region prediction result of the sample unit.

Finally, based on the regional label data of the sample unit and the regional prediction result of the sample unit, calculating the loss by using a loss function of the preset recognition model, training model parameters of the preset recognition model, namely training model parameters of a feature extractor and a classification layer in the preset recognition model until the preset recognition model converges or reaches preset iteration times to finish training, and taking the trained preset recognition model as the regional recognition model. Thus, the present application provides a possible implementation manner, and the training step of the first target matching model may include, for example, the following S8-S13:

s8: performing unit division on a space material image of the sample space in an overhead view to obtain a sample unit in the sample space and a third material image of the sample unit; the sample cell has region tag data identifying the region to which the sample cell belongs.

As an example, the sample space is "city B" where a plurality of "vehicles" are gathered, and the spatial matter image of "city B" under the overhead view is "remote sensing image B", which is actually a remote sensing image covering the whole city of "city B", and the resolution may be 0.6 meters, including three channels of red, green and blue. The label data of the target functional area in the remote sensing image B is marked by referring to the remote sensing image, the street view image and the planning file, the sample units in the city B are obtained by dividing the units by 500 meters, the sample units of the target functional area are taken as positive samples, and otherwise, the sample units which do not belong to the target functional area are taken as negative samples; in addition, for the purpose of verifying the validity of the area identification model later, 20% of the sample units are randomly selected as a test set from all the sample units, and 80% of the sample units are left as a training set. Referring to fig. 5, fig. 5 is a schematic diagram of a portion of a sample cell in a sample space according to an embodiment of the present application; wherein the partial sample cell includes a positive sample in the training set, a negative sample in the training set, a positive sample in the test set, and a negative sample in the test set.

S9: a fourth substance image of the sample cell at the object's perspective is acquired.

As an example, on the basis of the above example, the fourth substance image of the sample cell in "city B" under the "vehicle" view angle may be a "street view image" acquired from the "X map", which is acquired by accessing an application interaction interface of sampling points under the "vehicle" view angle every 50 meters apart along the road network in "city B", the sampling points under each "vehicle" view angle acquiring images containing four directions of front, rear, left and right, and the coordinate positions of the sampling points.

S10: and performing key point clustering on the object activity information of the sample unit according to a preset time period to obtain a second time sequence of the sample unit.

As an example, on the basis of the above example, the object activity information of the sample cell in "city B" is "vehicle travel information", which may specifically be information including a vehicle number, a vehicle position, a vehicle travel time, vehicle start point information, vehicle end point information, and the like, which are acquired by a sensor such as an intelligent locator on the vehicle.

S11: and respectively extracting the characteristics of the third substance image, the fourth substance image and the second time sequence by a characteristic extractor in a preset recognition model to obtain a fourth depth characteristic, a fifth depth characteristic and a sixth depth characteristic.

S12: and carrying out region identification on the second fusion feature obtained by carrying out feature fusion on the fourth depth feature, the fifth depth feature and the sixth depth feature through a classification layer in a preset identification model, and obtaining a region prediction result of the sample unit.

S13: and training model parameters of a preset recognition model according to the region label data, the region prediction result and a loss function of the preset recognition model to obtain the region recognition model.

The number of the sample units in different areas may be different greatly, and the number difference value of the sample units in different areas is larger than a preset difference value, so as to avoid that the area recognition model obtained through training seriously deviates to the area to which the sample units with large number belong during area recognition, a weighted cross entropy loss function needs to be adopted in a loss function of the preset recognition model during training, and weights in the weighted cross entropy loss function are used for balancing the number difference value. Therefore, the present application provides a possible implementation manner, when the number difference value of the sample units in different areas is greater than the preset difference value, the loss function is specifically a weighted cross entropy loss function, and the weight in the weighted cross entropy loss function is used for balancing the number difference value.

In addition, in the embodiment of the present application, referring to the description about the feature extractor in the area recognition model, similarly, the feature extractor in the preset recognition model includes a first convolutional neural network, a convolutional-cyclic neural network, and a cyclic-full convolutional neural network, where the convolutional-cyclic neural network includes a second convolutional neural network and a first cyclic neural network; because the second convolutional neural network can be a ResNet18 obtained by training and fine-tuning based on the planes 365, the images in the planes 365 are very similar to the images in the fourth substance image, in order to save processing resources of an image processor (Graphics Processing Unit, GPU), reduce training complexity of a preset recognition model, avoid overfitting of a region recognition model, and only train the first convolutional neural network in the preset recognition model and freeze model parameters of the second convolutional neural network in the preset recognition model when training model parameters of the preset recognition model. Thus, the present application provides a possible implementation manner, where the feature extractor in the preset recognition model includes a first convolutional neural network, a convolutional-cyclic neural network, and a convolutional-full convolutional neural network, where the convolutional-cyclic neural network includes a second convolutional neural network and a first cyclic neural network, and the method may further include S14: and when training the model parameters of the preset recognition model, freezing the model parameters of the second convolutional neural network in the preset recognition model, and training the first convolutional neural network in the preset recognition model.

In the process of training the preset recognition model to obtain the region recognition model, the used optimizer is random gradient descent (Stochastic Gradient Descent, SGD), and the momentum parameter is set to be 0.9. In addition, a strategy of warming up the learning rate can be used, the prediction coefficient is 0.0035, and the warming up round is 10; training was continued after warm-up using an initial learning rate of 0.1, which was decayed using cosine annealing at this time. In addition, the method comprises the steps of; early stop methods may also be used to avoid overfitting of the region identification model.

Based on the above description of training the preset recognition model to obtain the region recognition model, the accuracy of the region recognition model on the test set may be 0.920, for example, the Kappa coefficient for measuring the recognition accuracy of the region recognition model is 0.720, and the F1 index for representing the evaluation index of the region recognition model is 0.773. The accuracy, kappa coefficient and F1 index of the region identification model are obtained by training one or more data in the first material image based on the unit to be identified, the second material image based on the unit to be identified and the first time sequence of the unit to be identified, which are shown in the following table, the accuracy of the region identification model obtained by training the first material image based on the unit to be identified or the second material image based on the unit to be identified can reach more than 0.8, and the accuracy of the region identification model obtained by training the two data together based on the first material image based on the unit to be identified and the second material image based on the unit to be identified is obviously improved; particularly, the accuracy of the region identification model obtained by training the first substance image of the unit to be identified, the second substance image of the unit to be identified and the first time sequence of the unit to be identified together can reach 0.920, the Kappa coefficient of the region identification model also reaches 0.720, the Kappa coefficient of the region identification model also reaches 0.773, and the effectiveness of three data fusion is represented.

Watch (watch)

Wherein I1 represents a first substance image of the unit to be identified, I2 represents a second substance image of the unit to be identified, and T1 represents a first time sequence of the unit to be identified; CNN1 represents a first convolutional neural network, CNN2-RNN1 represents a convolutional-convolutional neural network in which a second convolutional neural network and a first convolutional neural network are combined, and RNN2-FCN represents a convolutional-full convolutional neural network in which a second convolutional neural network and a full convolutional neural network are combined.

Referring to fig. 6, fig. 6 is a schematic diagram of confidence that an area recognition model provided by an embodiment of the present application recognizes that an area to which a sample unit belongs is correct; only 20% of the regions of the sample units are correct, the confidence coefficient of which is smaller than 0.7, and the average confidence coefficient of which is 0.897, which indicates that the region recognition model successfully recognizes the region features of most of the sample units, so that the confidence coefficient of which is correct for most of the regions of the sample units is at a higher level. Similarly, referring to fig. 7, fig. 7 is a schematic diagram of a confidence distribution of a region where a region identification model identifies a sample unit according to an embodiment of the present application; the correct confidence distribution condition of the region identification model identification sample unit belonging to the region indicates that the correct identification capability of the region identification model on most regions is at a higher level.

In addition, the training process of the preset recognition model is completed under the framework of Pytorch 1.10.0 by using Python 3.7, and the GPU may be NVIDIA GeForce RTX 2080 Ti, for example.

Based on the foregoing description, when the spatial region identification method is applied to a target functional region identification scene in an urban space, taking the space to be identified as the city to be identified, the object as a vehicle, the spatial substance image as a remote sensing image, the first substance image as a remote sensing image block, the plurality of local substance images as a plurality of street view images, the second substance image as a street view image, the object activity information as vehicle travel information, the key points as a travel starting point and a travel end point, the preset time period as every 1 hour, and the region identification model as a target functional region identification model as examples, the embodiment of the application provides a detailed description of the target functional region identification method in the urban space. Referring to fig. 8, fig. 8 is a flowchart of a method for identifying a target function area in a city space according to an embodiment of the present application, where the method may be executed by a server, and includes the following steps:

s801: performing unit division on the remote sensing image of the city to be identified under the overhead view to obtain a unit to be identified in the city to be identified and a remote sensing image block of the unit to be identified; the city to be identified is populated with a plurality of vehicles.

S802: and carrying out unit connection on a plurality of street view images of the city to be identified under the vehicle view angle according to the unit to be identified, and obtaining the street view images of the unit to be identified.

S803: and respectively carrying out travel starting point extraction and travel end point extraction on the travel information of the vehicles in the city to be identified, and obtaining travel starting point information and travel end point information of the city to be identified.

S804: and respectively carrying out unit connection according to the travel starting point information and the travel end point information of the city to be identified by the unit to be identified, and obtaining the travel starting point information and the travel end point information of the unit to be identified.

S805: and respectively carrying out travel starting point clustering and travel end point clustering according to the travel starting point information and the travel end point information of the unit to be identified every 1 hour, and obtaining a first time sequence obtained by fusing the travel starting point time sequence and the travel end point time sequence of the unit to be identified.

S806: and extracting the characteristics of the remote sensing image block through a first convolution neural network in the target functional area identification model to obtain a first depth characteristic.

S807: and extracting features of the street view image of the unit to be identified through a convolution-circulation neural network in the target functional area identification model, and obtaining second depth features.

S808: and extracting the characteristics of the first time sequence through a cyclic-full convolution neural network in the target functional area identification model to obtain a third depth characteristic.

S809: and carrying out target functional area recognition on the first fusion feature obtained by feature fusion of the first depth feature, the second depth feature and the third depth feature through a classification layer in the target functional area recognition model to obtain a target functional area recognition result of the unit to be recognized.

Of course, the method for identifying the region in the space provided by the embodiment of the application can also be applied to other functional region identification scenes in the urban space, or land utilization region identification scenes in the urban space, or other specific region identification scenes in other spaces, and the like.

It should be noted that, based on the implementation manner provided in the above aspects, further combinations may be further performed to provide further implementation manners.

Based on the spatial region identification method provided in the corresponding embodiment of fig. 2, the embodiment of the present application further provides a spatial region identification device, referring to fig. 9, fig. 9 is a structural diagram of the spatial region identification device provided in the embodiment of the present application, where the spatial region identification device 900 includes: a dividing unit 901, an acquiring unit 902, a clustering unit 903, an extracting unit 904, and an identifying unit 905;

the dividing unit 901 is configured to perform unit division on a spatial substance image of a space to be identified in an overhead view, and obtain a unit to be identified in the space to be identified and a first substance image of the unit to be identified; the space to be identified is gathered with a plurality of objects;

An acquiring unit 902, configured to acquire a second object image of the unit to be identified under the object perspective;

the clustering unit 903 is configured to perform key point clustering on key point information in object activity information of a unit to be identified according to a preset time period, so as to obtain a first time sequence of the unit to be identified;

an extracting unit 904, configured to extract, by using a feature extractor in the region identification model, features of the first substance image, the second substance image, and the first time sequence, respectively, to obtain a first depth feature, a second depth feature, and a third depth feature;

the identifying unit 905 is configured to perform region identification on a first fusion feature obtained by feature fusion of the first depth feature, the second depth feature, and the third depth feature through a classification layer in the region identification model, so as to obtain a region identification result of the unit to be identified.

In one possible implementation manner, the acquiring unit 902 is specifically configured to:

and performing unit connection on a plurality of partial substance images of the space to be identified under the object view angle according to the unit to be identified, and acquiring a second substance image.

In one possible implementation, the clustering unit 903 is specifically configured to:

extracting key points from the object activity information of the space to be identified to obtain the key point information of the space to be identified;

Performing unit connection according to the key point information of the space to be identified by the unit to be identified, and acquiring the key point information of the unit to be identified;

and clustering key points according to the key point information of the unit to be identified in a preset time period to obtain a first time sequence.

performing unit connection according to the object activity information of the space to be identified by the unit to be identified, and acquiring the object activity information of the unit to be identified;

extracting key points from the object activity information of the unit to be identified to obtain the key point information of the unit to be identified;

In one possible implementation, the apparatus further includes: a normalization unit and an update unit;

the normalization unit is used for carrying out normalization processing on the first time sequence to obtain a normalized first time sequence;

and the updating unit is used for updating the first time sequence according to the normalized first time sequence.

In one possible implementation, the feature extractor in the region identification model includes a first convolutional neural network, a convolutional-convolutional neural network, and a convolutional-full convolutional neural network;

The extracting unit 904 is specifically configured to:

extracting features of the first substance image through a first convolutional neural network in the region identification model to obtain a first depth feature;

performing feature extraction on the second object image through a convolution-circulation neural network in the region identification model to obtain a second depth feature;

and extracting the characteristics of the first time sequence through a cyclic-full convolution neural network in the region identification model to obtain a third depth characteristic.

In one possible implementation, the convolutional-recurrent neural network includes a second convolutional neural network and a first recurrent neural network;

the extracting unit 904 is specifically configured to:

extracting features of the second object image through a second convolution neural network in the region identification model to obtain a first convolution feature;

and calculating the first convolution characteristic through a first cyclic neural network in the region identification model to obtain a second depth characteristic.

In one possible implementation, the cyclic-full convolutional neural network includes a second cyclic neural network and a full convolutional neural network;

the extracting unit 904 is specifically configured to:

extracting the characteristics of the first time sequence through a second cyclic neural network in the region identification model to obtain second convolution characteristics;

Performing feature extraction on the first time sequence through a full convolution neural network in the region identification model to obtain time features;

and carrying out fusion processing on the second convolution characteristic and the time characteristic to obtain a third depth characteristic.

In one possible implementation, the apparatus further includes: a training unit;

the training unit is specifically used for:

performing unit division on a space material image of the sample space in an overhead view to obtain a sample unit in the sample space and a third material image of the sample unit; the sample unit has region tag data identifying a region to which the sample unit belongs;

acquiring a fourth substance image of the sample unit under the object view angle;

performing key point clustering on key point information in object activity information of the sample unit according to a preset time period to obtain a second time sequence of the sample unit;

respectively extracting the characteristics of the third substance image, the fourth substance image and the second time sequence through a characteristic extractor in a preset recognition model to obtain a fourth depth characteristic, a fifth depth characteristic and a sixth depth characteristic;

performing region identification on a second fusion feature obtained by feature fusion of a fourth depth feature, a fifth depth feature and a sixth depth feature through a classification layer in a preset identification model to obtain a region prediction result of a sample unit;

And training model parameters of a preset recognition model according to the region label data, the region prediction result and a loss function of the preset recognition model to obtain the region recognition model.

In one possible implementation, when the number difference value of the sample units in the different regions is greater than the preset difference value, the loss function is specifically a weighted cross entropy loss function, and the weight in the weighted cross entropy loss function is used for balancing the number difference value.

In one possible implementation, the feature extractor in the preset recognition model includes a first convolutional neural network, a convolutional-cyclic neural network, and a convolutional-full convolutional neural network, where the convolutional-cyclic neural network includes a second convolutional neural network and a first cyclic neural network, and the apparatus further includes: a freezing unit;

and the freezing unit is used for freezing the model parameters of the second convolutional neural network in the preset recognition model and training the first convolutional neural network in the preset recognition model when training the model parameters of the preset recognition model.

The embodiment of the present application further provides a computer device, which may be a server, referring to fig. 10, and fig. 10 is a block diagram of a server provided by the embodiment of the present application, where the server 1000 may have a relatively large difference due to different configurations or performances, and may include one or more processors, such as a central processing unit (Central Processing Units, CPU) 1022, and a memory 1032, one or more storage media 1030 (such as one or more mass storage devices) storing application programs 1042 or data 1044. Wherein memory 1032 and storage medium 1030 may be transitory or persistent. The program stored on the storage medium 1030 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Further, central processor 1022 may be configured to communicate with storage medium 1030 to perform a series of instruction operations in storage medium 1030 on server 1000.

The Server 1000 may also include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input/output interfaces 1058, and/or one or more operating systems 1041, such as Windows Server ^TM ，Mac OS X ^TM ，Unix ^TM ,Linux ^TM ，FreeBSD ^TM Etc.

In this embodiment, the following steps may be performed by the central processor 1022 in the server 1000:

performing unit division on a space substance image of a space to be identified under an overhead view to obtain a unit to be identified and a first substance image of the unit to be identified in the space to be identified; the space to be identified is gathered with a plurality of objects;

acquiring a second object image of the unit to be identified under the view angle of the object;

performing key point clustering on key point information in object activity information of a unit to be identified according to a preset time period to obtain a first time sequence of the unit to be identified;

and carrying out region identification on the first fusion feature obtained by carrying out feature fusion on the first depth feature, the second depth feature and the third depth feature through a classification layer in the region identification model, and obtaining a region identification result of the unit to be identified.

The computer device provided by the embodiment of the present application may also be a terminal, and referring to fig. 11, fig. 11 is a structural diagram of the terminal provided by the embodiment of the present application. Taking a terminal as an example of a smart phone, the smart phone comprises: radio Frequency (RF) circuitry 1110, memory 1120, input unit 1130, display unit 1140, sensors 1150, audio circuit 1160, wireless fidelity (Wireless Fidelity, wiFi) module 1170, processor 1180, power supply 1190, and the like. The input unit 1130 may include a touch panel 1131 and other input devices 1132, the display unit 1140 may include a display panel 1141, and the audio circuit 1160 may include a speaker 1161 and a microphone 1162. Those skilled in the art will appreciate that the smartphone structure shown in fig. 11 is not limiting of the smartphone and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The memory 1120 may be used to store software programs and modules, and the processor 1180 executes various functional applications and data processing of the smartphone by executing the software programs and modules stored in the memory 1120. The memory 1120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, phonebooks, etc.) created according to the use of the smart phone, etc. In addition, memory 1120 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The processor 1180 is a control center of the smart phone, connects various parts of the entire smart phone using various interfaces and lines, performs various functions of the smart phone and processes data by running or executing software programs and/or modules stored in the memory 1120, and invoking data stored in the memory 1120. In the alternative, processor 1180 may include one or more processing units; preferably, the processor 1180 may integrate an application processor and a modem processor, wherein the application processor primarily handles operating systems, user interfaces, applications, etc., and the modem processor primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1180.

In this embodiment, the processor 1180 in the smart phone may perform the following steps:

According to an aspect of the present application, there is provided a computer-readable storage medium for storing a computer program for implementing the spatial region identification method according to the foregoing embodiments.

According to one aspect of the present application, there is provided a computer program product comprising a computer program stored in a computer readable storage medium. The processor of the computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program so that the computer device performs the methods provided in the various alternative implementations of the above embodiments.

The descriptions of the processes or structures corresponding to the drawings have emphasis, and the descriptions of other processes or structures may be referred to for the parts of a certain process or structure that are not described in detail.

The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or partly in the form of a software product or all or part of the technical solution, which is stored in a storage medium, and includes several instructions for causing a computer device to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media in which a computer program can be stored.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of identifying regions in space, the method comprising:

2. The method according to claim 1, wherein the acquiring a second object image of the unit to be identified at the object perspective is specifically:

And performing unit connection on a plurality of partial substance images of the space to be identified under the object view angle according to the unit to be identified, and acquiring the second substance image.

3. The method according to claim 1, wherein the performing key point clustering on key point information in object activity information of the unit to be identified according to a preset time period, to obtain a first time sequence of the unit to be identified, includes:

performing unit connection on the key point information of the space to be identified according to the unit to be identified, and acquiring the key point information of the unit to be identified;

and clustering key points of the key point information of the unit to be identified according to the preset time period to obtain the first time sequence.

4. The method according to claim 1, wherein the performing key point clustering on key point information in object activity information of the unit to be identified according to a preset time period, to obtain a first time sequence of the unit to be identified, includes:

performing unit connection on the object activity information of the space to be identified according to the unit to be identified to acquire the object activity information of the unit to be identified;

5. The method according to any one of claims 1-4, further comprising:

normalizing the first time sequence to obtain a normalized first time sequence;

and updating the first time sequence according to the normalized first time sequence.

6. The method of claim 1, wherein the feature extractor in the region identification model comprises a first convolutional neural network, a convolutional-round robin neural network, and a round robin-full convolutional neural network; the feature extractor in the region recognition model performs feature extraction on the first substance image, the second substance image and the first time sequence, and obtains a first depth feature, a second depth feature and a third depth feature, including:

extracting the characteristics of the first substance image through a first convolutional neural network in the region identification model to obtain the first depth characteristics;

Performing feature extraction on the second object image through a convolution-circulation neural network in the region identification model to obtain the second depth feature;

and extracting the characteristics of the first time sequence through a cyclic-full convolution neural network in the region identification model to obtain the third depth characteristics.

7. The method of claim 6, wherein the convolutional-recurrent neural network comprises a second convolutional neural network and a first recurrent neural network; the feature extraction of the second object image through the convolution-circulation neural network in the region identification model to obtain the second depth feature comprises the following steps:

and calculating the first convolution characteristic through a first cyclic neural network in the region identification model to obtain the second depth characteristic.

8. The method of claim 6, wherein the cyclic-full convolutional neural network comprises a second cyclic neural network and a full convolutional neural network; the feature extraction of the first time sequence through the cyclic-full convolution neural network in the region identification model to obtain the third depth feature comprises the following steps:

and carrying out fusion processing on the second convolution characteristic and the time characteristic to obtain the third depth characteristic.

9. The method of claim 1, wherein the training step of the region identification model comprises:

performing unit division on a space material image of a sample space in an overhead view to obtain a sample unit in the sample space and a third material image of the sample unit; the sample unit has region tag data identifying a region to which the sample unit belongs;

acquiring a fourth substance image of the sample unit at the object view angle;

performing key point clustering on key point information in the object activity information of the sample unit according to a preset time period to obtain a second time sequence of the sample unit;

Performing region identification on a second fusion feature obtained by feature fusion of the fourth depth feature, the fifth depth feature and the sixth depth feature through a classification layer in the preset identification model to obtain a region prediction result of the sample unit;

training model parameters of the preset recognition model according to the region tag data, the region prediction result and the loss function of the preset recognition model to obtain the region recognition model.

10. Method according to claim 9, characterized in that the loss function is in particular a weighted cross entropy loss function, in which weights are used for balancing the number difference values, when the number difference values of the sample cells of the different regions are larger than a preset difference value.

11. The method of claim 9, wherein the feature extractor in the preset recognition model comprises a first convolutional neural network, a convolutional-convolutional neural network, and a convolutional-full convolutional neural network, the convolutional-convolutional neural network comprising a second convolutional neural network and a first convolutional neural network, the method further comprising:

and when training the model parameters of the preset recognition model, freezing the model parameters of the second convolutional neural network in the preset recognition model, and training the first convolutional neural network in the preset recognition model.

12. A device for identifying regions in space, the device comprising: the device comprises a dividing unit, an acquisition unit, a clustering unit, an extraction unit and an identification unit;

13. A computer device, the computer device comprising a processor and a memory:

the processor is configured to perform the method of any of claims 1-11 according to instructions in the computer program.

14. A computer readable storage medium, characterized in that the computer readable storage medium is for storing a computer program for implementing the method of any one of claims 1-11.

15. A computer program product comprising a computer program which, when run on a computer device, causes the computer device to perform the method of any of claims 1-11.