WO2021097845A1 - 一种仿真场景的图像生成方法、电子设备和存储介质 - Google Patents
一种仿真场景的图像生成方法、电子设备和存储介质 Download PDFInfo
- Publication number
- WO2021097845A1 WO2021097845A1 PCT/CN2019/120408 CN2019120408W WO2021097845A1 WO 2021097845 A1 WO2021097845 A1 WO 2021097845A1 CN 2019120408 W CN2019120408 W CN 2019120408W WO 2021097845 A1 WO2021097845 A1 WO 2021097845A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instance
- network
- segmentation information
- information
- scene
- Prior art date
Links
- 238000004088 simulation Methods 0.000 title claims abstract description 116
- 238000000034 method Methods 0.000 title claims abstract description 67
- 230000011218 segmentation Effects 0.000 claims abstract description 135
- 238000012545 processing Methods 0.000 claims description 53
- 238000012549 training Methods 0.000 claims description 38
- 238000005070 sampling Methods 0.000 claims description 37
- 230000008569 process Effects 0.000 abstract description 20
- 239000004927 clay Substances 0.000 abstract 4
- 238000005286 illumination Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 14
- 238000005516 engineering process Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 230000008447 perception Effects 0.000 description 7
- 239000013598 vector Substances 0.000 description 6
- 238000012795 verification Methods 0.000 description 5
- 241001465754 Metazoa Species 0.000 description 4
- 230000006399 behavior Effects 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000012827 research and development Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000003416 augmentation Effects 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 230000003750 conditioning effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Definitions
- the embodiments of the present disclosure relate to the technical field, and in particular to a method for generating an image of a simulated scene, an electronic device, and a storage medium.
- Simulation is an important part of technology exploration and technology verification and testing in the current research and development process of artificial intelligence technologies such as intelligent driving and robots.
- simulation scenarios can generate massive training data to train computer vision algorithms (target detection and recognition). , Segmentation, tracking, etc.) and decision-making algorithms (imitation learning and reinforcement learning, etc.), as well as providing almost unlimited later algorithm verification test scenarios.
- simulation scenarios need to be built.
- the current process of building simulation scenarios is as follows: First, a lot of manpower and material resources are spent on site surveying and mapping, and then the model is manually built in the simulation engine based on the surveying and mapping data. Refine details such as color, texture, and lighting. It can be seen that the construction process of the simulation scene is cumbersome, time-consuming, laborious, and low in efficiency, and the built simulation scene has poor scalability and the simulation engine rendering requires high device hardware and software.
- At least one embodiment of the present disclosure provides an image generation method, electronic device, and storage medium of a simulated scene.
- an embodiment of the present disclosure proposes an image generation method of a simulation scene, the method includes:
- the instance text information is editable information and used to describe the attributes of the instance;
- an image of a simulation scene is generated.
- the embodiments of the present disclosure also provide an electronic device, including: a processor and a memory; the processor is configured to execute the steps of the method described in the first aspect by calling a program or instruction stored in the memory.
- the embodiments of the present disclosure also propose a non-transitory computer-readable storage medium for storing a program or instruction, the program or instruction causes a computer to execute the steps of the method described in the first aspect
- the scene white model needs to be established, and then based on the semantic segmentation information and instance segmentation information of the scene white model, an image of the simulated scene can be generated without the need to refine the colors during the scene creation process. Attributes such as, texture, lighting, etc., improve the generation efficiency; and the text information of the examples can be edited, and the text information of different examples describes the attributes of different examples, corresponding to different examples, making the simulation scene diversified.
- FIG. 1 is a schematic diagram of a simulation scenario provided by an embodiment of the present disclosure
- Figure 2 is a block diagram of an electronic device provided by an embodiment of the present disclosure
- Fig. 3 is a simulation scene image generation system provided by an embodiment of the present disclosure
- FIG. 4 is a flowchart of an image generation method of a simulation scene provided by an embodiment of the present disclosure
- FIG. 5 is an architecture diagram of a self-encoding network provided by an embodiment of the present disclosure.
- FIG. 6 is an architecture diagram of a generative confrontation network provided by an embodiment of the present disclosure.
- FIG. 7 is an architecture diagram of a discrimination network provided by an embodiment of the present disclosure.
- the construction process for the current simulation scene is as follows: First, a lot of manpower and material resources are spent on site surveying and mapping, and then a model is manually established in the simulation engine based on the surveying and mapping data and details such as color, texture, and lighting are refined. It can be seen that the construction process of the simulation scene is cumbersome, time-consuming, laborious, and low in efficiency, and the built simulation scene has poor scalability and the simulation engine rendering requires high device hardware and software.
- the embodiments of the present disclosure provide an image generation solution for a simulation scene. Only a scene white model needs to be established, and then based on the semantic segmentation information and instance segmentation information of the scene white model, an image of the simulation scene can be generated without the need for refinement in the scene creation process. Attributes such as color, texture, lighting, etc., improve the generation efficiency; moreover, instance text information can be edited, and different instance text information describes the attributes of different instances, corresponding to different instances, making the simulation scene diversified.
- the image generation solution of the simulated scene provided by the embodiments of the present disclosure can be applied to electronic devices.
- the simulation scene is, for example, an intelligent driving simulation scene
- the simulation scene is, for example, a simulation scene generated by a simulation engine.
- the simulation engine may include, but is not limited to: Unreal Engine (Unreal Engine), Unity, etc.
- Fig. 1 is a schematic diagram of a simulation scenario provided by an embodiment of the present disclosure.
- the simulation scenario may include, but is not limited to: green belts, sidewalks, motor vehicle lanes, street lights, trees, and other facilities in the real environment.
- Objects and at least one virtual vehicle 101, intelligent driving vehicle 102, pedestrians, and other dynamic objects.
- the virtual vehicle 101 may include a wayfinding system and other systems for driving.
- the virtual vehicle 101 may include: a wayfinding system, a perception system, a decision-making system, a control system, and other systems for driving.
- the path-finding system is used to construct a road network topology, and to find a path based on the constructed road network topology.
- the path finding system is used to obtain a high-precision map, and based on the high-precision map, construct a road network topology.
- the high-precision map is a geographic map used in the field of intelligent driving
- the high-precision map is a map describing a simulation scene.
- High-precision maps are different in that: 1) High-precision maps include a large amount of driving assistance information, such as the accurate three-dimensional representation of the road network: including intersections and road signs positions, etc.; 2) High-precision maps also Including a large amount of semantic information, such as reporting the meaning of different colors on traffic lights, and indicating the speed limit of the road, and the position where the left-turn lane starts; 3) High-precision maps can reach centimeter-level accuracy to ensure the safety of intelligent driving vehicles Driving. Therefore, the wayfinding path generated by the wayfinding system can provide a richer planning and decision-making basis for the decision-making system, such as the number of lanes at the current location, width, orientation, and the location of various traffic attachments.
- the perception system is used for collision detection (Collision Detection). In some embodiments, the perception system is used to perceive obstacles in a simulated scene.
- the decision-making system is used to decide the driving behavior of the virtual vehicle 101 based on the wayfinding path generated by the wayfinding system, the obstacles sensed by the perception system, and the kinematics information of the virtual vehicle 101 through a preset behavior tree (Behavior Tree).
- the kinematics information includes, but is not limited to, speed, acceleration, and other motion-related information, for example.
- the control system is used to control the driving behavior of the virtual vehicle 101 based on the decision made by the decision system, and feed back the kinematics information of the virtual vehicle 101 to the decision system.
- each system in the virtual vehicle 101 is only a logical function division, and there may be other division methods in actual implementation.
- the function of the wayfinding system can be integrated into the perception system, decision system or control system. Medium; any two or more systems can also be implemented as one system; any one system can also be divided into multiple subsystems.
- each system or subsystem can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professional technicians can use different methods for each specific application to achieve the described functions.
- the intelligent driving vehicle 102 at least includes: a sensor group and an intelligent driving system.
- the sensor group is used to collect the data of the external environment of the vehicle and detect the position data of the vehicle.
- the sensor group is also used to collect dynamic data of the vehicle.
- the intelligent driving system is used to obtain the data of the sensor group, perform environmental perception and vehicle positioning based on the data of the sensor group, and perform path planning and decision-making based on the environmental perception information and vehicle positioning information, and generate vehicle control instructions based on the planned path to control The vehicle follows the planned route.
- the virtual vehicle 101 and the intelligent driving vehicle 102 are both generated in a simulation scenario and are not real vehicles, the virtual vehicle 101 and the intelligent driving vehicle 102 can be controlled by a background processor, and the background processor can be a server. , Computers, tablet computers and other hardware devices with processing functions.
- FIG. 2 is a block diagram of an electronic device provided by an embodiment of the disclosure.
- the electronic equipment can support the operation of the simulation system.
- the simulation system can provide simulation scenarios and generate virtual vehicles and provide other functions for simulation.
- the simulation system may be a simulation system based on a simulation engine.
- the electronic device includes: at least one processor 201, at least one memory 202, and at least one communication interface 203.
- the various components in the electronic device are coupled together through the bus system 204.
- the communication interface 203 is used for information transmission with external devices.
- the bus system 204 is used to implement communication connections between these components.
- the bus system 204 also includes a power bus, a control bus, and a status signal bus.
- various buses are marked as the bus system 204 in FIG. 2.
- the memory 202 in this embodiment may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
- the memory 202 stores the following elements, executable units or data structures, or a subset of them, or an extended set of them: operating systems and applications.
- the operating system includes various system programs, such as a framework layer, a core library layer, and a driver layer, which are used to implement various basic services and process hardware-based tasks.
- Application programs including various application programs, such as Media Player, Browser, etc., are used to implement various application services.
- a program that implements the method for generating an image of a simulation scene provided by an embodiment of the present disclosure may be included in an application program.
- the processor 201 calls a program or instruction stored in the memory 202, specifically, it may be a program or instruction stored in an application program, and the processor 201 is used to execute the image of the simulation scene provided by the embodiment of the present disclosure. Steps of each embodiment of the generating method.
- the method for generating an image of a simulated scene may be applied to the processor 201 or implemented by the processor 201.
- the processor 201 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 201 or instructions in the form of software.
- the aforementioned processor 201 may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- the steps of the method for generating an image of a simulated scene may be directly embodied as executed and completed by a hardware decoding processor, or executed by a combination of hardware and software units in the decoding processor.
- the software unit may be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
- the storage medium is located in the memory 202, and the processor 201 reads the information in the memory 202 and completes the steps of the method in combination with its hardware.
- FIG. 3 is a block diagram of a simulation scene image generation system 300 provided by an embodiment of the disclosure.
- the simulation scene image generation system 300 may be implemented as a system running in the electronic device shown in FIG. 2 or a part of a simulation system running in the electronic device.
- the simulation scene image generation system may be stored in the memory 202 of the electronic device shown in FIG. 2.
- the processor 201 in FIG. 2 implements the functions of the units included in the simulation scene image generation system 300 by calling the simulation scene image generation system 300 stored in the memory 202.
- the simulation scene image generation system 300 may be applied to the processor 201 of the electronic device shown in FIG. 2 or implemented by the processor 201.
- the units of the simulation scene image generation system 300 can be completed by hardware integrated logic circuits in the processor 201 or instructions in the form of software.
- the simulation scene image generation system 300 may be divided into multiple units, for example, it may include but not limited to: an acquisition unit 301, a receiving unit 302, and a generating unit 303.
- the acquiring unit 301 is configured to acquire semantic segmentation information and instance segmentation information of the scene white model.
- the scene white model can be understood as a scene model without adding color, texture, lighting and other attribute information.
- the scene white model is established by a simulation engine, and the semantic segmentation information and instance segmentation information of the scene white model are generated by the simulation engine based on the scene white model.
- the scene white model is manually established in the simulation engine, and there is no need to manually add attribute information such as color, texture, and lighting; the simulation engine can automatically generate semantic segmentation information and instance segmentation information based on the scene white model.
- the semantic segmentation information is used to distinguish or describe different types of objects in the simulation scene: people, cars, animals, buildings, etc.; the instance segmentation information is used to distinguish or describe each object in the simulation scene: different people, Different cars, different animals, different buildings, etc. That is, for an object in the simulation scene, the semantic segmentation information indicates whether the object is a person or a car; if it is a car, the instance segmentation information indicates whether the car is an Audi or a Volkswagen; the instance text information indicates whether the car is a white car or a black car.
- the receiving unit 302 is configured to receive instance text information of the scene white model; the instance text information is editable information and used to describe the attributes of the instance. By changing the content of the instance text information, the editing of instance attributes is realized, and different instance attributes correspond to different instances.
- the instance text information of the scene white model is manually input, and in the process of manually inputting the instance text information, the content of the instance text information can be edited, and the receiving unit 302 receives the manually input instance text information.
- the instance text information is used to describe the attributes of the instance, the instance text information is set as editable information, which realizes the editability of the attributes of the instance.
- a simulation scene is a scene with editability of instance attributes.
- the attributes of the instance may include, but are not limited to, color, texture, lighting, and the like.
- the generating unit 303 is configured to generate an image of a simulation scene based on semantic segmentation information, instance segmentation information, instance text information, and pre-trained Generative Adversarial Networks (GAN).
- the instance text information is not directly used as input for generating the confrontation network, but the generation unit 303 generates a feature map based on the instance text information and at least one real image corresponding to the scene white model. Among them, the real image will only be provided during the training process.
- the generating unit 303 generates an image of a simulation scene through a pre-trained generating confrontation network based on the semantic segmentation information, the instance segmentation information, and the feature map.
- the generation unit 303 cascades semantic segmentation information, instance segmentation information, and feature maps (essentially vector cascade, such as cascading in the channel dimension, or element corresponding addition), and then inputs the pre-trained generation confrontation Network to generate images of simulated scenes.
- a confrontation network is generated from the input of the feature map to adjust the color, texture, lighting and other attributes of the instances in the scene.
- the generating unit 303 generates the image of the simulation scene as a high-resolution image, and the simulation scene is a high-resolution scene, which is convenient for technology exploration and technology verification testing in the artificial intelligence technology research and development process.
- the generating unit 303 generates a feature map based on the instance text information and at least one real image corresponding to the scene white model, specifically: embedding the instance text information and condition enhancement processing to obtain the processing result; At least one real image corresponding to the white mode is encoded to obtain the hidden variable corresponding to each real image, where the hidden variable can be understood as an intermediate variable, and one image corresponds to one hidden variable; the hidden variable corresponding to each real image Sampling is performed to obtain sampling results.
- hidden variable sampling is used to adjust the attribute information of the instances in the simulation scene to realize the diversification of the images of the simulation scene; the processing results and sampling results are decoded to generate a feature map.
- the generation unit 303 performs embedding processing and conditional enhancement processing on the instance text information to obtain the processing result, specifically: inputting the instance text information into a pre-trained embedding network, and the output of the embedding network is passed through the pre-trained embedding network. Conditioning Augmentation network to get the processing result.
- the embedded network and the conditional enhancement network are both neural networks and the network parameters are obtained through pre-training.
- the generating unit 303 inputs at least one real image corresponding to the white mode of the scene into the encoder of the pre-trained self-encoding network for encoding processing to obtain the hidden variables corresponding to each real image;
- the hidden variables corresponding to the real image are sampled to obtain the sampling result;
- the decoder of the self-encoding network decodes the processing result and the sampling result to generate a feature map.
- the self-encoding network is a variational self-encoding network.
- the architecture of the self-encoding network is shown in Figure 5, including a convolutional layer and a deconvolutional layer, where the convolutional layer can be understood as the encoder of the self-encoding network, and the deconvolutional layer can be understood as the self-encoding network.
- Decoder for encoding network The input information of the self-encoding network is at least one real image corresponding to the scene white mode, that is, the input of the convolutional layer of the self-encoding network is at least one real image corresponding to the scene white mode.
- the output information of the self-encoding grid is a feature map, that is, the output of the deconvolution layer of the self-encoding network is a feature map.
- the example text information is input to a pre-trained embedding network, and the output of the embedding network is a set of low-dimensional vectors.
- the output of the embedding network is enhanced through the pre-trained conditional enhancement network to obtain the processing result.
- the self-encoding network samples the hidden variables corresponding to each real image to obtain the sampling results.
- the processing result and the sampling result are cascaded (essentially vector cascade, such as cascading in the channel dimension, or element correspondingly added) and then input to the deconvolution layer of the self-encoding network for decoding processing to generate a feature map.
- the generative confrontation network used in the generation unit 303 includes a generative network and a discriminant network.
- the generative network is composed of multiple nested generators, and the generator includes a convolutional layer and a deconvolution.
- the output of the last layer of the feature map of the deconvolution layer of the generator nested inside is used as the input of the deconvolution layer of the generator nested outside.
- the discriminant network since the discriminant network is mainly used to train the generative network, after the training of the generative network is completed, it will independently generate the image of the simulation scene. Therefore, in the following, when describing the function of the generative network, the generative adversarial network is used instead of the generative network.
- the network that is, the image of the simulation scene generated by the generation of the confrontation network, those skilled in the art can understand that the image of the simulation scene is generated by the generation network of the generation of the confrontation network.
- the discriminant network alone does not mean that the discriminant network is not a generative adversarial network.
- the generating unit 303 is specifically configured to: input semantic segmentation information, instance segmentation information, and feature maps to the convolutional layer of the generator that generates the outermost layer of the confrontation network; After the graph is down-sampled, it is input to the convolutional layer of the generator that generates the inner layer of the confrontation network; the deconvolution layer of the generator that generates the outermost layer of the confrontation network outputs the image of the simulation scene.
- the multiples of downsampling processing corresponding to different generators in the inner layer may be different.
- the input of the nested generator needs to be down-sampled, so that the resolution of the output is reduced, so that the overall information of the output is paid attention to.
- the output of the deconvolution layer of the outermost generator is the output of the generated adversarial network, with a higher resolution and attention to the detailed information of the output. On the whole, the generated images of the simulation scene output by the confrontation network pay attention to both the whole and the details.
- the architecture of the generative adversarial network is shown in Figure 6, which consists of N (N ⁇ 3) generators nested, denoted as generator 1, generator 2, ..., from the inside to the outside. ⁇ N.
- Each generator includes a convolutional layer and a deconvolutional layer.
- the input information for generating the confrontation network is semantic segmentation information, instance segmentation information and feature maps.
- the output information of the generated confrontation network is the image of the simulation scene, that is, the output of the deconvolution layer of the generator N is the image of the simulation scene.
- the input information of the generated confrontation network is input to the convolutional layer of the generator N.
- the input information of the generated confrontation network is down-sampled and then input to the convolutional layer of generator 2.
- the input information of the generated confrontation network is down-sampled again and then input to the convolutional layer of generator 1.
- the purpose of downsampling is to reduce the resolution, for example, the reduction ratio is 1/2 ⁇ 1/2; if the output of generator N is 200 ⁇ 200 resolution, the output of generator 2 is 100 ⁇ 100 resolution, The output of generator 1 is 50 ⁇ 50 resolution. It can be seen that the resolution of generator N is high, and more attention is paid to details; the resolution of generator 2 and generator 1 is low, and the whole is more concerned. Therefore, it is more reasonable to generate high-definition images output by the confrontation network, paying attention to both the whole and the details.
- the generative confrontation network, the embedded network, the condition enhancement network, and the self-encoding network used by the generating unit 303 are obtained through joint training.
- the joint training may include: acquiring semantic segmentation information, instance segmentation information, instance text information, and sample images of the sample scene; and then performing joint training based on the semantic segmentation information, instance segmentation information, instance text information, and sample images.
- the generative confrontation network, embedding network, condition enhancement network, and self-encoding network used by the generating unit 303 perform joint training based on semantic segmentation information, instance segmentation information, instance text information, and sample images, specifically:
- the output of the deconvolution layer of the processor generates images; images, sample images, semantic segmentation information, instance segmentation information and feature maps will be generated, and training will be completed through the discriminant network.
- the generated images output by the generated confrontation network are fake pictures.
- their feature values are marked as "fake” to indicate that they are randomly generated pictures rather than real pictures.
- the sample image is a real picture, and its feature value can be marked as "real”. Images, sample images, semantic segmentation information, instance segmentation information, and feature maps will be generated.
- the discriminant network can more accurately judge the real picture and the fake picture, thereby giving feedback to the generation of the confrontation network, so that the generation Confronting the network to generate fake and real pictures.
- the discriminant network can continue training when the judgment probability value of each discriminator is not satisfied to 0.5, until the training goal is satisfied through multiple iterations.
- the "training target” may be a preset target of whether the generated image of the generated confrontation network meets the requirements.
- the training target for generating the confrontation network may be, for example, that the predicted feature value of the picture meets a specified requirement, for example, close to 0.5. After judging that the convergence to 0.5 is satisfied, the training is stopped.
- the discriminant network is composed of multiple discriminators cascaded; the input of the uppermost discriminator is generated image, sample image, semantic segmentation information, instance segmentation information, and feature map; generated image, sample image, semantic The segmentation information, the instance segmentation information, and the feature map are input to the lower-level discriminator after down-sampling processing; wherein, the multiples of the down-sampling processing corresponding to the discriminators of different levels can be different.
- the architecture of the discriminating network is shown in Fig. 7, which is composed of N (N ⁇ 3) discriminators cascaded, denoted as discriminator 1, discriminator 2, ..., discriminator from top to bottom. N.
- the input information of the discriminant network is generated image, sample image, semantic segmentation information, instance segmentation information and feature map.
- the output information of the discrimination network is the judgment probability value.
- the input information of the discrimination network is input to the discriminator 1.
- the input information of the discriminating network is down-sampled and then input to the discriminator 2.
- the input information of the discriminating network is down-sampled again and then input to the discriminator N. If the judgment probability values output by the discriminator 1, the discriminator 2 and the discriminator 3 all converge to 0.5, the joint training ends.
- the simulation scene image generation system 300 may be a software system, a hardware system, or a combination of software and hardware.
- the simulation scene image generation system 300 is a software system running on an operating system
- the hardware system of an electronic device is a hardware system supporting the operation of the operating system.
- the division of each unit in the simulation scene image generation system 300 is only a logical function division.
- the acquiring unit 301, the receiving unit 302, and the generating unit 303 may be implemented as One unit; the acquiring unit 301, the receiving unit 302, or the generating unit 303 can also be divided into multiple sub-units.
- each unit or subunit can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Those skilled in the art can use different methods for each specific application to realize the described functions.
- FIG. 4 is a flowchart of a method for generating an image of a simulation scene provided by an embodiment of the disclosure.
- the execution body of the method is an electronic device.
- the execution body of the method is a simulation scene image generation system running in the electronic device; or the execution body of the method is a simulation system running in the electronic device, where the simulation The scene image generation system can be a part of the simulation system.
- the image generation method of the simulation scene may include but is not limited to the following steps 401 to 403:
- the scene white model can be understood as a scene model without adding color, texture, lighting and other attribute information.
- the scene white model is established by a simulation engine, and the semantic segmentation information and instance segmentation information of the scene white model are generated by the simulation engine based on the scene white model.
- the scene white model is manually established in the simulation engine, and there is no need to manually add attribute information such as color, texture, and lighting; the simulation engine can automatically generate semantic segmentation information and instance segmentation information based on the scene white model.
- the semantic segmentation information is used to distinguish or describe different types of objects in the simulation scene: people, cars, animals, buildings, etc.; the instance segmentation information is used to distinguish or describe each object in the simulation scene: different people, Different cars, different animals, different buildings, etc. That is, for an object in the simulation scene, the semantic segmentation information indicates whether the object is a person or a car; if it is a car, the instance segmentation information indicates whether the car is an Audi or a Volkswagen; the instance text information indicates whether the car is a white car or a black car.
- the instance text information of the scene white model is editable information and is used to describe the attributes of the instance.
- the instance text information of the scene white model is manually input, and in the process of manually inputting the instance text information, the content of the instance text information can be edited, and step 402 receives the manually input instance text information.
- the instance text information is used to describe the attributes of the instance, the instance text information is set as editable information, which realizes the editability of the attributes of the instance.
- a simulation scene is a scene with editability of instance attributes.
- the attributes of the instance may include, but are not limited to, color, texture, lighting, and the like.
- an image of a simulation scene based on the semantic segmentation information, the instance segmentation information, the instance text information, and a pre-trained generating confrontation network.
- the instance text information is not directly used as input for generating a confrontation network, but a feature map is generated based on the instance text information and at least one real image corresponding to the scene white model.
- an image of a simulation scene is generated through a pre-trained generating confrontation network.
- the semantic segmentation information, the instance segmentation information, and the feature map are cascaded (essentially vector cascade) and then input into a pre-trained generative confrontation network to generate an image of a simulated scene.
- a confrontation network is generated from the input of the feature map to adjust the color, texture, lighting and other attributes of the instances in the scene.
- the image that generates the simulation scene is a high-resolution image
- the simulation scene is a high-resolution scene, which is convenient for technology exploration and technology verification testing in the process of artificial intelligence technology research and development.
- the feature map is generated based on at least one real image corresponding to the instance text information and the scene white model, specifically: embedding the instance text information and condition enhancement processing to obtain the processing result; corresponding the scene white model At least one of the real images is encoded to obtain the hidden variables corresponding to each real image.
- the hidden variables can be understood as intermediate variables, and one image corresponds to one hidden variable; the hidden variables corresponding to each real image are sampled, The sampling result is obtained, where the attribute information of the instance in the simulation scene is adjusted through hidden variable sampling to realize the diversification of the image of the simulation scene; the processing result and the sampling result are decoded to generate a feature map.
- the instance text information is embedded and conditionally enhanced to obtain the processing result, specifically: the instance text information is input into a pre-trained embedding network, and the output of the embedding network is enhanced by the pre-trained condition ( Conditioning Augmentation) network to get the processing result.
- the embedded network and the conditional enhancement network are both neural networks and the network parameters are obtained through pre-training.
- At least one real image corresponding to the white mode of the scene is input to the encoder of the pre-trained self-encoding network for encoding processing to obtain the hidden variables corresponding to each real image; the self-encoding network corresponds to each real image The latent variable of the sample is sampled to obtain the sampling result; the decoder of the self-encoding network decodes the processing result and the sampling result to generate a feature map.
- the self-encoding network is a variational self-encoding network.
- the architecture of the self-encoding network is shown in Figure 5, including a convolutional layer and a deconvolutional layer, where the convolutional layer can be understood as the encoder of the self-encoding network, and the deconvolutional layer can be understood as the self-encoding network.
- Decoder for encoding network The input information of the self-encoding network is at least one real image corresponding to the scene white mode, that is, the input of the convolutional layer of the self-encoding network is at least one real image corresponding to the scene white mode.
- the output information of the self-encoding grid is a feature map, that is, the output of the deconvolution layer of the self-encoding network is a feature map.
- the example text information is input to a pre-trained embedding network, and the output of the embedding network is a set of low-dimensional vectors.
- the output of the embedding network is enhanced through the pre-trained conditional enhancement network to obtain the processing result.
- the self-encoding network samples the hidden variables corresponding to each real image to obtain the sampling results.
- the processing result and the sampling result are cascaded (essentially vector cascade) and then input to the deconvolution layer of the self-encoding network for decoding processing to generate a feature map.
- the generative adversarial network includes a generative network and a discriminant network, where the generative network is composed of multiple nested generators, where the generator includes a convolutional layer and a deconvolutional layer, and is nested inside The output of the last feature map of the deconvolution layer of the generator is the input of the deconvolution layer of the generator nested outside.
- the discriminant network since the discriminant network is mainly used to train the generative network, after the training of the generative network is completed, it will independently generate the image of the simulation scene. Therefore, in the following, when describing the function of the generative network, the generative adversarial network is used instead of the generative network.
- the network that is, the image of the simulation scene generated by the generation of the confrontation network, those skilled in the art can understand that the image of the simulation scene is generated by the generation network of the generation of the confrontation network.
- the discriminant network alone does not mean that the discriminant network is not a generative adversarial network.
- the semantic segmentation information, instance segmentation information, and feature map are input to the convolutional layer of the generator that generates the outermost layer of the confrontation network; the semantic segmentation information, instance segmentation information, and feature map are down-sampled and input To generate the convolutional layer of the generator of the inner layer of the confrontation network; generate the deconvolution layer of the generator of the outermost layer of the confrontation network to output the image of the simulation scene.
- the multiples of downsampling processing corresponding to different generators in the inner layer may be different.
- the input of the nested generator needs to be down-sampled, so that the resolution of the output is reduced, so that the overall information of the output is paid attention to.
- the output of the deconvolution layer of the outermost generator is the output of the generated adversarial network, with a higher resolution and attention to the detailed information of the output. On the whole, the generated images of the simulation scene output by the confrontation network pay attention to both the whole and the details.
- the architecture of the generative adversarial network is shown in Figure 6, which consists of N (N ⁇ 3) generators nested, denoted as generator 1, generator 2, ..., from the inside to the outside. ⁇ N.
- Each generator includes a convolutional layer and a deconvolutional layer.
- the input information for generating the confrontation network is semantic segmentation information, instance segmentation information and feature maps.
- the output information of the generated confrontation network is the image of the simulation scene, that is, the output of the deconvolution layer of the generator N is the image of the simulation scene.
- the input information of the generated confrontation network is input to the convolutional layer of the generator N.
- the input information of the generated confrontation network is down-sampled and then input to the convolutional layer of generator 2.
- the input information of the generated confrontation network is down-sampled again and then input to the convolutional layer of generator 1.
- the purpose of downsampling is to reduce the resolution, for example, the reduction ratio is 1/2 ⁇ 1/2; if the output of generator N is 200 ⁇ 200 resolution, the output of generator 2 is 100 ⁇ 100 resolution, The output of generator 1 is 50 ⁇ 50 resolution. It can be seen that generator N has a high resolution and pays more attention to details; generator 2 and generator 1 have low resolution and pay more attention to the whole. Therefore, it is more reasonable to generate high-definition images output by the confrontation network, paying attention to both the whole and the details.
- the generative confrontation network, the embedded network, the conditional enhancement network, and the self-encoding network are obtained through joint training.
- the joint training may include: acquiring semantic segmentation information, instance segmentation information, instance text information, and sample images of the sample scene; and then performing joint training based on the semantic segmentation information, instance segmentation information, instance text information, and sample images.
- generating a confrontation network, an embedding network, a conditional enhancement network, and a self-encoding network perform joint training based on semantic segmentation information, instance segmentation information, instance text information, and sample images, specifically:
- the output of the deconvolution layer of the processor generates images; images, sample images, semantic segmentation information, instance segmentation information and feature maps will be generated, and training will be completed through the discriminant network.
- the generated images output by the generated confrontation network are fake pictures.
- their feature values are marked as "fake” to indicate that they are randomly generated pictures rather than real pictures.
- the sample image is a real picture, and its feature value can be marked as "real”. Images, sample images, semantic segmentation information, instance segmentation information, and feature maps will be generated.
- the discriminant network can more accurately judge the real picture and the fake picture, thereby giving feedback to the generation of the confrontation network, so that the generation Confronting the network to generate fake and real pictures.
- the discriminant network can continue training when the judgment probability value of each discriminator is not satisfied to 0.5, until the training goal is satisfied through multiple iterations.
- the "training target” may be a preset target of whether the generated image of the generated confrontation network meets the requirements.
- the training target for generating the confrontation network may be, for example, that the predicted feature value of the picture meets a specified requirement, for example, close to 0.5. After judging that the convergence to 0.5 is satisfied, the training is stopped.
- the discriminant network is composed of multiple discriminators cascaded; the input of the uppermost discriminator is generated image, sample image, semantic segmentation information, instance segmentation information, and feature map; generated image, sample image, semantic The segmentation information, the instance segmentation information, and the feature map are input to the lower-level discriminator after down-sampling processing; wherein, the multiples of the down-sampling processing corresponding to the discriminators of different levels can be different.
- the architecture of the discriminating network is shown in Fig. 7, which is composed of N (N ⁇ 3) discriminators cascaded, denoted as discriminator 1, discriminator 2, ..., discriminator from top to bottom. N.
- the input information of the discriminant network is generated image, sample image, semantic segmentation information, instance segmentation information and feature map.
- the output information of the discrimination network is the judgment probability value.
- the input information of the discrimination network is input to the discriminator 1.
- the input information of the discriminating network is down-sampled and then input to the discriminator 2.
- the input information of the discriminating network is down-sampled again and then input to the discriminator N. If the judgment probability values output by the discriminator 1, the discriminator 2 and the discriminator 3 all converge to 0.5, the joint training ends.
- the embodiments of the present disclosure also provide a non-transitory computer-readable storage medium that stores a program or instruction that causes a computer to execute an image generation method such as a simulation scene. In order to avoid repeating the description, I won’t repeat them here.
- the scene white model needs to be established, and then based on the semantic segmentation information and instance segmentation information of the scene white model, an image of the simulated scene can be generated without the need to refine the color, texture, and lighting attributes during the scene creation process.
- improve the generation efficiency moreover, the text information of the examples can be edited, and the text information of different examples describes the attributes of different examples, corresponding to different examples, making the simulation scene diversified and has industrial applicability.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (14)
- 一种仿真场景的图像生成方法,其特征在于,所述方法包括:获取场景白模的语义分割信息和实例分割信息;接收所述场景白模的实例文本信息;所述实例文本信息为可编辑的信息,且用于描述实例的属性;基于所述语义分割信息、所述实例分割信息、所述实例文本信息和预先训练的生成对抗网络,生成仿真场景的图像。
- 根据权利要求1所述的方法,其特征在于,所述场景白模由仿真引擎建立,且所述场景白模的语义分割信息和实例分割信息由所述仿真引擎基于所述场景白模生成。
- 根据权利要求1所述的方法,其特征在于,基于所述语义分割信息、所述实例分割信息、所述实例文本信息和预先训练的生成对抗网络,生成仿真场景的图像,包括:基于所述实例文本信息和所述场景白模对应的至少一张真实图像,生成特征图;基于所述语义分割信息、所述实例分割信息和所述特征图,通过预先训练的生成对抗网络,生成仿真场景的图像。
- 根据权利要求3所述的方法,其特征在于,基于所述实例文本信息和所述场景白模对应的至少一张真实图像,生成特征图,包括:将所述实例文本信息进行嵌入处理和条件增强处理,得到处理结果;将所述场景白模对应的至少一张真实图像进行编码处理,得到每张真实图像对应的隐变量;将每张真实图像对应的隐变量进行采样,得到采样结果;将所述处理结果和所述采样结果进行解码处理,生成特征图。
- 根据权利要求4所述的方法,其特征在于,将所述实例文本信息进行嵌入处理和条件增强处理,得到处理结果,包括:将所述实例文本信息输入预先训练的嵌入网络,所述嵌入网络的输出通过预先训 练的条件增强网络,得到处理结果。
- 根据权利要求5所述的方法,其特征在于,将所述场景白模对应的至少一张真实图像输入预先训练的自编码网络的编码器进行编码处理,得到每张真实图像对应的隐变量;所述自编码网络将每张真实图像对应的隐变量进行采样,得到采样结果;所述自编码网络的解码器将所述处理结果和所述采样结果进行解码处理,生成特征图。
- 根据权利要求3所述的方法,其特征在于,所述生成对抗网络由嵌套的多个生成器构成,其中,所述生成器包括卷积层和反卷积层,且嵌套在内的生成器的反卷积层的最后一层特征图输出作为嵌套在外的生成器的反卷积层的输入。
- 根据权利要求7所述的方法,其特征在于,基于所述语义分割信息、所述实例分割信息和所述特征图,通过预先训练的生成对抗网络,生成仿真场景的图像,包括:将所述语义分割信息、所述实例分割信息和所述特征图输入到所述生成对抗网络最外层的生成器的卷积层;将所述语义分割信息、所述实例分割信息和所述特征图进行下采样处理后输入到所述生成对抗网络内层的生成器的卷积层;其中,内层的不同生成器对应的下采样处理的倍数不同;所述生成对抗网络最外层的生成器的反卷积层输出仿真场景的图像。
- 根据权利要求6所述的方法,其特征在于,所述生成对抗网络、所述嵌入网络、所述条件增强网络和所述自编码网络通过联合训练得到。
- 根据权利要求9所述的方法,其特征在于,所述联合训练,包括:获取样本场景的语义分割信息、实例分割信息、实例文本信息和样本图像;基于所述语义分割信息、实例分割信息、实例文本信息和样本图像进行联合训练。
- 根据权利要求10所述的方法,其特征在于,基于所述语义分割信息、实例分割信息、实例文本信息和样本图像进行联合训练,包括:将所述实例文本信息输入所述嵌入网络,所述嵌入网络的输出通过条件增强网络, 得到处理结果;将所述样本图像输入所述自编码网络的编码器进行编码处理,得到每张样本图像对应的隐变量;所述自编码网络将每张样本图像对应的隐变量进行采样,得到采样结果;所述自编码网络的解码器将所述处理结果和所述采样结果进行解码处理,生成特征图;将所述语义分割信息、所述实例分割信息和所述特征图输入到所述生成对抗网络最外层的生成器的卷积层;将所述语义分割信息、所述实例分割信息和所述特征图进行下采样处理后输入到所述生成对抗网络内层的生成器的卷积层;所述生成对抗网络最外层的生成器的反卷积层输出生成图像;将所述生成图像、所述样本图像、所述语义分割信息、所述实例分割信息和所述特征图,通过判别网络,完成训练。
- 根据权利要求11所述的方法,其特征在于,所述判别网络由级联的多个判别器构成;最上级的判别器的输入为所述生成图像、所述样本图像、所述语义分割信息、所述实例分割信息和所述特征图;所述生成图像、所述样本图像、所述语义分割信息、所述实例分割信息和所述特征图经过下采样处理后输入下级的判别器;其中,不同级的判别器对应的下采样处理的倍数不同。
- 一种电子设备,其特征在于,包括:处理器和存储器;所述处理器通过调用所述存储器存储的程序或指令,用于执行如权利要求1至12任一项所述方法的步骤。
- 一种非暂态计算机可读存储介质,其特征在于,所述非暂态计算机可读存储介质存储程序或指令,所述程序或指令使计算机执行如权利要求1至12任一项所述方法的步骤。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201980002612.5A CN110998663B (zh) | 2019-11-22 | 2019-11-22 | 一种仿真场景的图像生成方法、电子设备和存储介质 |
PCT/CN2019/120408 WO2021097845A1 (zh) | 2019-11-22 | 2019-11-22 | 一种仿真场景的图像生成方法、电子设备和存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/120408 WO2021097845A1 (zh) | 2019-11-22 | 2019-11-22 | 一种仿真场景的图像生成方法、电子设备和存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021097845A1 true WO2021097845A1 (zh) | 2021-05-27 |
Family
ID=70080461
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/120408 WO2021097845A1 (zh) | 2019-11-22 | 2019-11-22 | 一种仿真场景的图像生成方法、电子设备和存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110998663B (zh) |
WO (1) | WO2021097845A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113673369A (zh) * | 2021-07-30 | 2021-11-19 | 中国科学院自动化研究所 | 遥感图像场景规划方法、装置、电子设备和存储介质 |
CN114491694A (zh) * | 2022-01-17 | 2022-05-13 | 北京航空航天大学 | 一种基于虚幻引擎的空间目标数据集构建方法 |
CN115641400A (zh) * | 2022-11-04 | 2023-01-24 | 广州大事件网络科技有限公司 | 一种动态图片生成方法、系统、设备及存储介质 |
CN117953108A (zh) * | 2024-03-20 | 2024-04-30 | 腾讯科技(深圳)有限公司 | 图像生成方法、装置、电子设备和存储介质 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112149545B (zh) * | 2020-09-16 | 2024-04-09 | 珠海格力电器股份有限公司 | 样本生成方法、装置、电子设备及存储介质 |
CN116310659B (zh) * | 2023-05-17 | 2023-08-08 | 中数元宇数字科技(上海)有限公司 | 训练数据集的生成方法及设备 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108564126A (zh) * | 2018-04-19 | 2018-09-21 | 郑州大学 | 一种融合语义控制的特定场景生成方法 |
CN109472365A (zh) * | 2017-09-08 | 2019-03-15 | 福特全球技术公司 | 使用辅助输入通过生成式对抗网络来细化合成数据 |
US20190266442A1 (en) * | 2018-02-28 | 2019-08-29 | Fujitsu Limited | Tunable generative adversarial networks |
CN110378284A (zh) * | 2019-07-18 | 2019-10-25 | 北京京东叁佰陆拾度电子商务有限公司 | 道路正视图生成方法及装置、电子设备、存储介质 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4857601B2 (ja) * | 2005-05-17 | 2012-01-18 | 凸版印刷株式会社 | シミュレーション画像生成装置、方法、演算プログラム、及びそのプログラムを記録した記録媒体 |
JP6732214B2 (ja) * | 2017-03-10 | 2020-07-29 | オムロン株式会社 | 画像処理装置、画像処理方法、テンプレート作成装置、物体認識処理装置及びプログラム |
CN109447897B (zh) * | 2018-10-24 | 2023-04-07 | 文创智慧科技(武汉)有限公司 | 一种真实场景图像合成方法及系统 |
CN109360231B (zh) * | 2018-10-25 | 2022-01-07 | 哈尔滨工程大学 | 基于分形深度卷积生成对抗网络的海冰遥感图像仿真方法 |
CN110111335B (zh) * | 2019-05-08 | 2021-04-16 | 南昌航空大学 | 一种自适应对抗学习的城市交通场景语义分割方法及系统 |
CN110443863B (zh) * | 2019-07-23 | 2023-04-07 | 中国科学院深圳先进技术研究院 | 文本生成图像的方法、电子设备和存储介质 |
-
2019
- 2019-11-22 WO PCT/CN2019/120408 patent/WO2021097845A1/zh active Application Filing
- 2019-11-22 CN CN201980002612.5A patent/CN110998663B/zh active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109472365A (zh) * | 2017-09-08 | 2019-03-15 | 福特全球技术公司 | 使用辅助输入通过生成式对抗网络来细化合成数据 |
US20190266442A1 (en) * | 2018-02-28 | 2019-08-29 | Fujitsu Limited | Tunable generative adversarial networks |
CN108564126A (zh) * | 2018-04-19 | 2018-09-21 | 郑州大学 | 一种融合语义控制的特定场景生成方法 |
CN110378284A (zh) * | 2019-07-18 | 2019-10-25 | 北京京东叁佰陆拾度电子商务有限公司 | 道路正视图生成方法及装置、电子设备、存储介质 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113673369A (zh) * | 2021-07-30 | 2021-11-19 | 中国科学院自动化研究所 | 遥感图像场景规划方法、装置、电子设备和存储介质 |
CN114491694A (zh) * | 2022-01-17 | 2022-05-13 | 北京航空航天大学 | 一种基于虚幻引擎的空间目标数据集构建方法 |
CN115641400A (zh) * | 2022-11-04 | 2023-01-24 | 广州大事件网络科技有限公司 | 一种动态图片生成方法、系统、设备及存储介质 |
CN115641400B (zh) * | 2022-11-04 | 2023-11-17 | 广州大事件网络科技有限公司 | 一种动态图片生成方法、系统、设备及存储介质 |
CN117953108A (zh) * | 2024-03-20 | 2024-04-30 | 腾讯科技(深圳)有限公司 | 图像生成方法、装置、电子设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN110998663A (zh) | 2020-04-10 |
CN110998663B (zh) | 2023-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021097845A1 (zh) | 一种仿真场景的图像生成方法、电子设备和存储介质 | |
CN111860425B (zh) | 一种深度多模态跨层交叉融合方法、终端设备及存储介质 | |
WO2020191642A1 (zh) | 轨迹预测方法及装置、存储介质、驾驶系统与车辆 | |
US11132211B1 (en) | Neural finite state machines | |
CN110383340A (zh) | 使用稀疏体积数据进行路径规划 | |
US11940803B2 (en) | Method, apparatus and computer storage medium for training trajectory planning model | |
CN110415516A (zh) | 基于图卷积神经网络的城市交通流预测方法及介质 | |
CN114084155B (zh) | 预测型智能汽车决策控制方法、装置、车辆及存储介质 | |
CN113468978B (zh) | 基于深度学习的细粒度车身颜色分类方法、装置和设备 | |
CN113139446B (zh) | 一种端到端自动驾驶行为决策方法、系统及终端设备 | |
WO2022121207A1 (zh) | 轨迹规划方法、装置、设备、存储介质和程序产品 | |
CN111142402B (zh) | 仿真场景构建方法、装置和终端 | |
WO2023123906A1 (zh) | 交通信号灯控制方法及相关设备 | |
CN111860411A (zh) | 一种基于注意力残差学习的道路场景语义分割方法 | |
CN111553242B (zh) | 用于预测驾驶行为的生成对抗网络的训练方法和电子设备 | |
CN111460879B (zh) | 利用网格生成器的神经网络运算方法及使用该方法的装置 | |
Oh et al. | Hcnaf: Hyper-conditioned neural autoregressive flow and its application for probabilistic occupancy map forecasting | |
CN115249266A (zh) | 航路点位置预测方法、系统、设备及存储介质 | |
CN115346193A (zh) | 一种车位检测方法及其跟踪方法、车位检测装置、车位检测设备及计算机可读存储介质 | |
Hou et al. | Integrated graphical representation of highway scenarios to improve trajectory prediction of surrounding vehicles | |
Chen et al. | Real-time lane detection model based on non bottleneck skip residual connections and attention pyramids | |
WO2023226781A1 (zh) | 地图生成方法和相关产品 | |
CN117197296A (zh) | 交通道路场景模拟方法、电子设备及存储介质 | |
Yi et al. | Improving synthetic to realistic semantic segmentation with parallel generative ensembles for autonomous urban driving | |
CN116383410A (zh) | 灯路关系知识图谱构建、关系预测、自动驾驶方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19953638 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19953638 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19953638 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.12.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19953638 Country of ref document: EP Kind code of ref document: A1 |