WO2024096145A1

WO2024096145A1 - Mobility apparatus and method for generating transmission or reception signal in wireless communication system

Info

Publication number: WO2024096145A1
Application number: PCT/KR2022/016830
Authority: WO
Inventors: 정익주; 이상림; 이태현
Original assignee: 엘지전자 주식회사
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2024-05-10

Abstract

The present disclosure may provide an operating method of a first device in a wireless communication system. The first device may comprise the steps of: receiving, from a second device, a request for capability information regarding the first device; transmitting the capability information of the first device to the second device; if the first device is a device having a semantic communication capability on the basis of the capability information of the first device, receiving semantic communication-related information from the second device; generating a semantic communication signal on the basis of the semantic communication-related information; and transmitting, to the second device, the semantic communication signal and information for performing weakly labeling. Here, the semantic communication signal may be generated by using shared information and the information for performing the weakly labeling, the weakly labeling is an operation of assigning a weak label to semantic data by using the shared information and assistance information, and updating of the shared information may be performed on the basis of an operation of a downstream task performed by the second device.

Description

Mobility device and method for generating transmitting and receiving signals in a wireless communication system

The following description is about a wireless communication system, and relates to an apparatus and method for generating transmission and reception signals in a wireless communication system.

Specifically, a method and device for performing a downstream task based on a task-oriented operation in semantic communication can be provided. In addition, a method and device for generating a signal for performing a downstream task based on a weakly supervised contrastive learning technique can be provided.

Wireless access systems are being widely deployed to provide various types of communication services such as voice and data. In general, a wireless access system is a multiple access system that can support communication with multiple users by sharing available system resources (bandwidth, transmission power, etc.). Examples of multiple access systems include code division multiple access (CDMA) systems, frequency division multiple access (FDMA) systems, time division multiple access (TDMA) systems, orthogonal frequency division multiple access (OFDMA) systems, and single carrier frequency (SC-FDMA) systems. division multiple access) systems, etc.

In particular, as many communication devices require large communication capacity, enhanced mobile broadband (eMBB) communication technology is being proposed compared to the existing radio access technology (RAT). In addition, a communication system that takes into account reliability and latency-sensitive services/UE (user equipment) as well as mMTC (massive machine type communications), which connects multiple devices and objects to provide a variety of services anytime and anywhere, is being proposed. . Various technological configurations are being proposed for this purpose.

This disclosure relates to an apparatus and method for generating transmission and reception signals in a wireless communication system.

The present disclosure can provide an apparatus and method for transmitting and receiving signals between semantic layers located at a source and a destination in a wireless communication system.

The present disclosure can provide an apparatus and method for learning how to generate a signal using weakly-supervised contrastive learning in a wireless communication system.

The present disclosure can provide an apparatus and method for performing weakly labeling in a wireless communication system.

The present disclosure can provide a method for generating a signal for performing a downstream task of a destination in a wireless communication system.

The present disclosure may provide an apparatus and method for updating background knowledge held at a source and a destination in a wireless communication system.

The present disclosure can provide an apparatus and method for updating learning information for generating signals in a wireless communication system.

The technical objectives sought to be achieved by the present disclosure are not limited to the matters mentioned above, and other technical tasks not mentioned are subject to common knowledge in the technical field to which the technical configuration of the present disclosure is applied from the embodiments of the present disclosure described below. Can be considered by those who have.

As an example of the present disclosure, a method of operating a first device in a wireless communication system includes receiving a capability information request for a first device from a second device, transmitting capability information of the first device to the second device. If the first device is a device equipped with semantic communication capabilities based on the capability information of the first device, receiving semantic communication-related information from the second device, the semantic communication-related It may include generating a semantic communication signal based on information, and transmitting the semantic communication signal and information for performing weakly labeling to the second device. Here, the semantic communication signal is generated using shared information and information for performing the weekly labeling, and the weekly labeling is performed by applying a weak label to semantic data using the shared information and auxiliary information. ), and the update of the shared information may be performed based on the operation of a downstream task performed in the second device.

As an example of the present disclosure, a method of operating a second device in a wireless communication system includes transmitting a capability information request to a first device, receiving capability information from the first device, and receiving capability information from the first device. When the first device is a device having semantic communication capabilities, transmitting semantic communication-related information to the first device, and a semantic communication signal generated from the first device based on the semantic communication-related information. And it may include receiving information for performing weekly labeling. Here, the semantic communication signal is generated using shared information and information for performing the weekly labeling, and the weekly labeling is performed by applying a weak label to semantic data using the shared information and auxiliary information. ), and the update of the shared information may be performed based on the operation of the downstream task performed in the second device.

As an example of the present disclosure, in a wireless communication system, a first device includes a transceiver, and a processor connected to the transceiver, wherein the processor receives a capability information request for the first device from a second device, and 1 Transmit the capability information of the device to the second device, and if the first device is a device equipped with semantic communication capability based on the capability information of the first device, receive semantic communication-related information from the second device. And, a semantic communication signal can be generated based on the semantic communication-related information, and the semantic communication signal and information for performing weekly labeling can be controlled to be transmitted to the second device. Here, the semantic communication signal is generated using shared information and information for performing the weekly labeling, and the weekly labeling is performed by applying a weak label to semantic data using the shared information and auxiliary information. ), and the update of the shared information may be performed based on the operation of a downstream task performed in the second device.

As an example of the present disclosure, a second device includes a transceiver, and a processor connected to the transceiver, wherein the processor transmits a capability information request to the first device, receives capability information from the first device, and Based on the capability information of the first device, if the first device is a device equipped with semantic communication capability, semantic communication-related information is transmitted to the first device, and the first device transmits semantic communication-related information based on the semantic communication-related information. Controlled to receive a generated semantic communication signal and information for performing weekly labeling, wherein the semantic communication signal is generated using shared information and information for performing the weekly labeling, and the weekly Labeling is an operation of assigning a weak label to semantic data using the shared information and auxiliary information, and updating of the shared information is an operation of a downstream task performed in the second device. It can be performed based on .

As an example of the present disclosure, a first device includes at least one memory and at least one processor functionally connected to the at least one memory, wherein the processor includes the first device and the second device. Receive a capability information request for a first device from, transmit capability information of the first device to the second device, and determine if the first device has semantic communication capability based on the capability information of the first device. In the case of a device, it receives semantic communication-related information from the second device, generates a semantic communication signal based on the semantic communication-related information, and sends the semantic communication signal and information for performing weekly labeling to the second device. 2 You can control transmission to the device. Here, the semantic communication signal is generated using shared information and information for performing the weekly labeling, and the weekly labeling is performed by applying a weak label to semantic data using the shared information and auxiliary information. ), and the update of the shared information may be performed based on the operation of the downstream task performed in the second device.

As an example of the present disclosure, a non-transitory computer-readable medium storing at least one instruction. Comprising the at least one instruction executable by a processor, the at least one instruction configured to: receive a capability information request from a second device, transmit capability information to the second device, and Based on this, when the computer-readable medium is a medium with semantic communication capability, receives semantic communication-related information from the second device, generates a semantic communication signal based on the semantic communication-related information, and generates the semantic communication signal. And it can be controlled to transmit information for performing weakly labeling to the second device. Here, the semantic communication signal is generated using shared information and information for performing the weekly labeling, and the weekly labeling is performed by applying a weak label to semantic data using the shared information and auxiliary information. ), and the update of the shared information may be performed based on the operation of the downstream task performed in the second device.

As an example of the present disclosure, the semantic communication signal is not decoded by the second device into the raw data used by the first device to generate the representation and is used for a downstream task. It can be used for performance.

As an example of the present disclosure, the capability information is information for determining whether the first device can perform semantic communication, including the type of raw data that the first device can process and It may include computing capability information of the first device.

As an example of the present disclosure, the semantic communication-related information includes a semantic data acquisition unit, information for performing the weekly labeling, mini-batch size, augmentation type, and augmentation ratio. , may include at least one of the configuration information of the encoding model, the semantic data is data extracted from the raw data, the information for performing the weekly labeling includes auxiliary information, and the acquisition unit and the augment The type of statement and the augmentation rate may be determined based on shared information between the first device and the second device.

As an example of the present disclosure, obtaining semantic data from raw data, performing weekly labeling on the semantic data to generate weekly labeled semantic data, and generating weekly labeled semantic data from the weekly labeled semantic data. A step of generating augmentation data may be further included.

As an example of the present disclosure, the shared information update is performed using a signal converted from the semantic communication signal, and the converted signal may be generated based on a data format used to perform a downstream task. there is.

As an example of the present disclosure, the shared information update is performed using a transform head, and the transform head includes at least one dance layer (dense layer) and at least one non-linear (non-linear) linear) function.

As an example of the present disclosure, the shared information update may be performed using at least one of an expression used in pre-learning, an expression used in learning to perform a downstream task, and an expression used in inference.

As an example of the present disclosure, learning for the downstream task may be generated based on the first layer of the transform head and at least one layer determined for performing the downstream task.

As an example of the present disclosure, learning for the downstream task may include a fine-tuning operation or a transfer-learning operation.

As an example of the present disclosure, the fine tuning operation, after pre-learning is completed, uses the weight of the encoder, the weight for the additional operation, and the weight for the first layer of the transform head to determine the neural network according to the downstream task. It can be performed on all networks, including neural networks.

As an example of the present disclosure, the transfer learning operation is performed according to the downstream task, after pre-learning is completed, with the weight of the encoder, the weight for the additional operation, and the weight for the first layer of the transform head being fixed. It can be performed on an added multi-layer perceptron (MLP).

As an example of the present disclosure, the semantic communication signal may be transmitted on a layer for semantic communication.

The following effects may be achieved by embodiments based on the present disclosure.

In embodiments based on the present disclosure, a method for transmitting and receiving source and destination signals in semantic communication can be provided.

In embodiments based on the present disclosure, a method for transmitting and receiving signals between semantic layers located at a source and a destination can be provided.

In embodiments based on the present disclosure, a source may provide a method for generating a signal suitable for a downstream task at a destination.

In embodiments based on the present disclosure, a method of performing learning for signal generation using weakly supervised contrastive learning may be provided.

In embodiments based on the present disclosure, a method of performing weakly labeling may be provided.

In embodiments based on the present disclosure, a learning method for generating a signal suitable for a downstream task of the destination may be provided.

In embodiments based on the present disclosure, a method may be provided to update background knowledge held by the source and destination in order to perform a downstream task located at the destination in a task-oriented manner. there is.

The effects that can be obtained from the embodiments of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned can be found in the technical field to which the technical configuration of the present disclosure is applied from the description of the embodiments of the present disclosure below. It can be clearly derived and understood by those with ordinary knowledge. That is, unintended effects resulting from implementing the configuration described in this disclosure may also be derived by a person skilled in the art from the embodiments of this disclosure.

The drawings attached below are intended to aid understanding of the present disclosure and may provide embodiments of the present disclosure along with a detailed description. However, the technical features of the present disclosure are not limited to specific drawings, and the features disclosed in each drawing may be combined to form a new embodiment. Reference numerals in each drawing may refer to structural elements.

1 is a diagram showing an example of a communication system applicable to the present disclosure.

Figure 2 is a diagram showing an example of a wireless device applicable to the present disclosure.

Figure 3 is a diagram showing another example of a wireless device applicable to the present disclosure.

Figure 4 is a diagram showing an example of AI (Artificial Intelligence) applicable to the present disclosure.

Figure 5 shows an example of a communication model divided into three stages according to an embodiment of the present disclosure.

Figure 6 shows an example of a semantic communication system according to an embodiment of the present disclosure.

Figure 7 shows an example of contrastive learning according to an embodiment of the present disclosure.

Figure 8 shows an example of instance discrimination for contrast learning according to an embodiment of the present disclosure.

Figure 9 shows an example of augmentation data according to an embodiment of the present disclosure.

Figure 10 shows an example of a problem that may occur when performing contrast learning according to an embodiment of the present disclosure.

Figure 11 shows an example framework for dictionary learning according to an embodiment of the present disclosure.

Figure 12 shows an example of semantic data generation according to an embodiment of the present disclosure.

Figure 13 shows an example of weekly labeling based on discrete attributes according to an embodiment of the present disclosure.

Figure 14 shows an example of weekly labeling based on a hierarchical level according to an embodiment of the present disclosure.

Figure 15 shows the performance of edge perturbation according to an embodiment of the present disclosure.

Figure 16 shows an example of an additional data conversion operation when the data modality is a graph.

Figure 17 shows an example of an additional data conversion operation when the data modality is text.

Figure 18 shows an example of a transform head configuration according to an embodiment of the present disclosure.

Figure 19 shows a graph of the results of performing weekly labeling based on discrete attributes according to an embodiment of the present disclosure.

Figure 20 shows a graph of the relationship between parameters related to weekly labeling based on hierarchical levels according to an embodiment of the present disclosure.

Figure 21 shows an example framework for training and inference according to a downstream task according to an embodiment of the present disclosure.

Figure 22 shows an example of a semantic signal generation operation procedure according to an embodiment of the present disclosure.

Figure 23 shows an example of a signal diagram for initial setup of semantic communication according to an embodiment of the present disclosure.

Figure 24 shows an example of an information exchange diagram in a mini-batch unit according to an embodiment of the present disclosure.

The following embodiments combine the elements and features of the present disclosure in a predetermined form. Each component or feature may be considered optional unless explicitly stated otherwise. Each component or feature may be implemented in a form that is not combined with other components or features. Additionally, some components and/or features may be combined to configure an embodiment of the present disclosure. The order of operations described in embodiments of the present disclosure may be changed. Some features or features of one embodiment may be included in another embodiment or may be replaced with corresponding features or features of another embodiment.

In the description of the drawings, procedures or steps that may obscure the gist of the present disclosure are not described, and procedures or steps that can be understood by a person skilled in the art are not described.

Throughout the specification, when a part is said to “comprise or include” a certain element, this means that it does not exclude other elements but may further include other elements, unless specifically stated to the contrary. do. In addition, terms such as "... unit", "... unit", and "module" used in the specification refer to a unit that processes at least one function or operation, which refers to hardware, software, or a combination of hardware and software. It can be implemented as: Additionally, the terms “a or an,” “one,” “the,” and similar related terms may be used differently herein in the context of describing the present disclosure (particularly in the context of the claims below). It may be used in both singular and plural terms, unless indicated otherwise or clearly contradicted by context.

In this specification, embodiments of the present disclosure have been described focusing on the data transmission and reception relationship between the base station and the mobile station. However, the present disclosure is not limited to data transmission and reception between the base station and the mobile station, and may be implemented in various forms, such as data transmission and reception between the mobile station and the mobile station. Here, the base station is meant as a terminal node of the network that directly communicates with the mobile station. Certain operations described in this document as being performed by the base station may, in some cases, be performed by an upper node of the base station.

That is, in a network comprised of a plurality of network nodes including a base station, various operations performed for communication with a mobile station may be performed by the base station or other network nodes other than the base station. At this time, 'base station' refers to terms such as fixed station, Node B, eNB (eNode B), gNB (gNode B), ng-eNB, advanced base station (ABS), or access point. It can be replaced by .

Additionally, in embodiments of the present disclosure, a terminal may include a user equipment (UE), a mobile station (MS), a subscriber station (SS), a mobile subscriber station (MSS), It can be replaced with terms such as mobile terminal or advanced mobile station (AMS).

Additionally, the transmitting end refers to a fixed and/or mobile node that provides a data service or a voice service, and the receiving end refers to a fixed and/or mobile node that receives a data service or a voice service. Therefore, in the case of uplink, the mobile station can be the transmitting end and the base station can be the receiving end. Likewise, in the case of downlink, the mobile station can be the receiving end and the base station can be the transmitting end.

Embodiments of the present disclosure include wireless access systems such as the IEEE 802.xx system, 3GPP (3rd Generation Partnership Project) system, 3GPP LTE (Long Term Evolution) system, 3GPP 5G (5th generation) NR (New Radio) system, and 3GPP2 system. It may be supported by at least one standard document disclosed in one, and in particular, embodiments of the present disclosure are supported by the 3GPP TS (technical specification) 38.211, 3GPP TS 38.212, 3GPP TS 38.213, 3GPP TS 38.321 and 3GPP TS 38.331 documents. It can be.

Additionally, embodiments of the present disclosure can be applied to other wireless access systems and are not limited to the above-described system. As an example, it may be applicable to systems applied after the 3GPP 5G NR system and is not limited to a specific system.

That is, obvious steps or parts that are not described among the embodiments of the present disclosure can be explained with reference to the documents. Additionally, all terms disclosed in this document can be explained by the standard document.

Hereinafter, preferred embodiments according to the present disclosure will be described in detail with reference to the attached drawings. The detailed description to be disclosed below along with the accompanying drawings is intended to describe exemplary embodiments of the present disclosure, and is not intended to represent the only embodiments in which the technical features of the present disclosure may be practiced.

Additionally, specific terms used in the embodiments of the present disclosure are provided to aid understanding of the present disclosure, and the use of such specific terms may be changed to other forms without departing from the technical spirit of the present disclosure.

The following technologies include code division multiple access (CDMA), frequency division multiple access (FDMA), time division multiple access (TDMA), orthogonal frequency division multiple access (OFDMA), and single carrier frequency division multiple access (SC-FDMA). It can be applied to various wireless access systems.

*For clarity of explanation, the following description is based on the 3GPP communication system (e.g., LTE, NR, etc.), but the technical idea of the present invention is not limited thereto. LTE is 3GPP TS 36.xxx Release 8 In detail, the LTE technology after 3GPP TS 36.xxx Release 10 will be referred to as LTE-A, and the LTE technology after 3GPP TS 36.xxx Release 13 will be referred to as LTE-A pro. 3GPP NR may mean technology after TS 38.xxx Release 15, and “xxx” may mean technology after TS Release 17 and/or Release 18. This means that LTE/NR/6G can be collectively referred to as a 3GPP system.

Regarding background technology, terms, abbreviations, etc. used in the present disclosure, reference may be made to matters described in standard documents published prior to the present invention. As an example, you can refer to the 36.xxx and 38.xxx standard documents.

Communication systems applicable to this disclosure

Although not limited thereto, the various descriptions, functions, procedures, suggestions, methods, and/or operational flowcharts of the present disclosure disclosed in this document can be applied to various fields requiring wireless communication/connection (e.g., 5G) between devices. there is.

Hereinafter, a more detailed example will be provided with reference to the drawings. In the following drawings/descriptions, identical reference numerals may illustrate identical or corresponding hardware blocks, software blocks, or functional blocks, unless otherwise noted.

1 is a diagram illustrating an example of a communication system applied to the present disclosure.

Referring to FIG. 1, the communication system 100 applied to the present disclosure includes a wireless device, a base station, and a network. Here, a wireless device refers to a device that performs communication using wireless access technology (e.g., 5G NR, LTE) and may be referred to as a communication/wireless/5G device. Although not limited thereto, wireless devices include robots (100a), vehicles (100b-1, 100b-2), extended reality (XR) devices (100c), hand-held devices (100d), and home appliances (100d). appliance) (100e), IoT (Internet of Thing) device (100f), and AI (artificial intelligence) device/server (100g). For example, vehicles may include vehicles equipped with wireless communication functions, autonomous vehicles, vehicles capable of inter-vehicle communication, etc. Here, the vehicles 100b-1 and 100b-2 may include an unmanned aerial vehicle (UAV) (eg, a drone). The XR device 100c includes augmented reality (AR)/virtual reality (VR)/mixed reality (MR) devices, including a head-mounted device (HMD), a head-up display (HUD) installed in a vehicle, a television, It can be implemented in the form of smartphones, computers, wearable devices, home appliances, digital signage, vehicles, robots, etc. The mobile device 100d may include a smartphone, smart pad, wearable device (eg, smart watch, smart glasses), computer (eg, laptop, etc.), etc. Home appliances 100e may include a TV, refrigerator, washing machine, etc. IoT device 100f may include sensors, smart meters, etc. For example, the base station 120 and the network 130 may also be implemented as wireless devices, and a specific wireless device 120a may operate as a base station/network node for other wireless devices.

Wireless devices 100a to 100f may be connected to the network 130 through the base station 120. AI technology may be applied to the wireless devices 100a to 100f, and the wireless devices 100a to 100f may be connected to the AI server 100g through the network 130. The network 130 may be configured using a 3G network, 4G (eg, LTE) network, or 5G (eg, NR) network. Wireless devices 100a to 100f may communicate with each other through the base station 120/network 130, but communicate directly (e.g., sidelink communication) without going through the base station 120/network 130. You may. For example, vehicles 100b-1 and 100b-2 may communicate directly (eg, vehicle to vehicle (V2V)/vehicle to everything (V2X) communication). Additionally, the IoT device 100f (eg, sensor) may communicate directly with other IoT devices (eg, sensor) or other wireless devices 100a to 100f.

Communication systems applicable to this disclosure

FIG. 2 is a diagram illustrating an example of a wireless device applicable to the present disclosure.

Referring to FIG. 2, the first wireless device 200a and the second wireless device 200b can transmit and receive wireless signals through various wireless access technologies (eg, LTE, NR). Here, {first wireless device 200a, second wireless device 200b} refers to {wireless device 100x, base station 120} and/or {wireless device 100x, wireless device 100x) in FIG. } can be responded to.

The first wireless device 200a includes one or more processors 202a and one or more memories 204a, and may further include one or more transceivers 206a and/or one or more antennas 208a. Processor 202a controls memory 204a and/or transceiver 206a and may be configured to implement the descriptions, functions, procedures, suggestions, methods and/or operational flowcharts disclosed herein. For example, the processor 202a may process information in the memory 204a to generate first information/signal and then transmit a wireless signal including the first information/signal through the transceiver 206a. Additionally, the processor 202a may receive a wireless signal including the second information/signal through the transceiver 206a and then store information obtained from signal processing of the second information/signal in the memory 204a. The memory 204a may be connected to the processor 202a and may store various information related to the operation of the processor 202a. For example, memory 204a may perform some or all of the processes controlled by processor 202a or instructions for performing the descriptions, functions, procedures, suggestions, methods and/or operational flowcharts disclosed herein. Software code containing them can be stored. Here, the processor 202a and the memory 204a may be part of a communication modem/circuit/chip designed to implement wireless communication technology (eg, LTE, NR). Transceiver 206a may be coupled to processor 202a and may transmit and/or receive wireless signals via one or more antennas 208a. Transceiver 206a may include a transmitter and/or receiver. The transceiver 206a may be used interchangeably with a radio frequency (RF) unit. In this disclosure, a wireless device may mean a communication modem/circuit/chip.

The second wireless device 200b includes one or more processors 202b, one or more memories 204b, and may further include one or more transceivers 206b and/or one or more antennas 208b. Processor 202b controls memory 204b and/or transceiver 206b and may be configured to implement the descriptions, functions, procedures, suggestions, methods and/or operational flowcharts disclosed herein. For example, the processor 202b may process information in the memory 204b to generate third information/signal and then transmit a wireless signal including the third information/signal through the transceiver 206b. Additionally, the processor 202b may receive a wireless signal including the fourth information/signal through the transceiver 206b and then store information obtained from signal processing of the fourth information/signal in the memory 204b. The memory 204b may be connected to the processor 202b and may store various information related to the operation of the processor 202b. For example, memory 204b may perform some or all of the processes controlled by processor 202b or instructions for performing the descriptions, functions, procedures, suggestions, methods and/or operational flowcharts disclosed herein. Software code containing them can be stored. Here, the processor 202b and the memory 204b may be part of a communication modem/circuit/chip designed to implement wireless communication technology (eg, LTE, NR). Transceiver 206b may be coupled to processor 202b and may transmit and/or receive wireless signals via one or more antennas 208b. The transceiver 206b may include a transmitter and/or a receiver. The transceiver 206b may be used interchangeably with an RF unit. In this disclosure, a wireless device may mean a communication modem/circuit/chip.

Hereinafter, the hardware elements of the

wireless devices

200a and 200b will be described in more detail. Although not limited thereto, one or more protocol layers may be implemented by one or

more processors

202a and 202b. For example, one or

more processors

202a and 202b may operate on one or more layers (e.g., physical (PHY), media access control (MAC), radio link control (RLC), packet data convergence protocol (PDCP), and radio resource (RRC). control) and functional layers such as SDAP (service data adaptation protocol) can be implemented. One or

more processors

202a, 202b may generate one or more protocol data units (PDUs) and/or one or more service data units (SDUs) according to the descriptions, functions, procedures, suggestions, methods, and/or operational flowcharts disclosed in this document. can be created. One or

more processors

202a and 202b may generate messages, control information, data or information according to the descriptions, functions, procedures, suggestions, methods and/or operational flowcharts disclosed in this document. One or

more processors

202a, 202b generate signals (e.g., baseband signals) containing PDUs, SDUs, messages, control information, data, or information according to the functions, procedures, proposals, and/or methods disclosed herein. , can be provided to one or more transceivers (206a, 206b). One or

more processors

202a, 202b may receive signals (e.g., baseband signals) from one or

more transceivers

206a, 206b, and the descriptions, functions, procedures, suggestions, methods, and/or operational flowcharts disclosed herein. Depending on the device, PDU, SDU, message, control information, data or information can be obtained.

One or

more processors

202a, 202b may be referred to as a controller, microcontroller, microprocessor, or microcomputer. One or

more processors

202a and 202b may be implemented by hardware, firmware, software, or a combination thereof. As an example, one or more application specific integrated circuits (ASICs), one or more digital signal processors (DSPs), one or more digital signal processing devices (DSPDs), one or more programmable logic devices (PLDs), or one or more field programmable gate arrays (FPGAs) May be included in one or

more processors

202a and 202b. The descriptions, functions, procedures, suggestions, methods and/or operational flowcharts disclosed in this document may be implemented using firmware or software, and the firmware or software may be implemented to include modules, procedures, functions, etc. Firmware or software configured to perform the descriptions, functions, procedures, suggestions, methods and/or operation flowcharts disclosed in this document may be included in one or

more processors

202a and 202b or stored in one or

more memories

204a and 204b. It may be driven by the

above processors

202a and 202b. The descriptions, functions, procedures, suggestions, methods and/or operational flowcharts disclosed in this document may be implemented using firmware or software in the form of codes, instructions and/or sets of instructions.

One or

more memories

204a and 204b may be connected to one or

more processors

202a and 202b and may store various types of data, signals, messages, information, programs, codes, instructions and/or commands. One or

more memories

204a, 204b may include read only memory (ROM), random access memory (RAM), erasable programmable read only memory (EPROM), flash memory, hard drives, registers, cache memory, computer readable storage media, and/or It may be composed of a combination of these. One or

more memories

204a and 204b may be located internal to and/or external to one or

more processors

202a and 202b. Additionally, one or

more memories

204a and 204b may be connected to one or

more processors

202a and 202b through various technologies, such as wired or wireless connections.

One or more transceivers (206a, 206b) may transmit user data, control information, wireless signals/channels, etc. mentioned in the methods and/or operation flowcharts of this document to one or more other devices. One or

more transceivers

206a, 206b may receive user data, control information, wireless signals/channels, etc. referred to in the descriptions, functions, procedures, suggestions, methods and/or operational flow charts, etc. disclosed herein from one or more other devices. there is. For example, one or

more transceivers

206a and 206b may be connected to one or

more processors

202a and 202b and may transmit and receive wireless signals. For example, one or

more processors

202a and 202b may control one or

more transceivers

206a and 206b to transmit user data, control information, or wireless signals to one or more other devices. Additionally, one or

more processors

202a and 202b may control one or

more transceivers

206a and 206b to receive user data, control information, or wireless signals from one or more other devices. In addition, one or more transceivers (206a, 206b) may be connected to one or more antennas (208a, 208b), and one or more transceivers (206a, 206b) may be connected to the description and functions disclosed in this document through one or more antennas (208a, 208b). , may be set to transmit and receive user data, control information, wireless signals/channels, etc. mentioned in procedures, proposals, methods and/or operation flow charts, etc. In this document, one or more antennas may be multiple physical antennas or multiple logical antennas (eg, antenna ports). One or more transceivers (206a, 206b) process the received user data, control information, wireless signals/channels, etc. using one or more processors (202a, 202b), and convert the received wireless signals/channels, etc. from the RF band signal. It can be converted to a baseband signal. One or more transceivers (206a, 206b) may convert user data, control information, wireless signals/channels, etc. processed using one or more processors (202a, 202b) from a baseband signal to an RF band signal. To this end, one or

more transceivers

206a, 206b may include (analog) oscillators and/or filters.

본 개시에 적용 가능한 무선 기기 구조Wireless device structure applicable to this disclosure

FIG. 3 is a diagram illustrating another example of a wireless device applied to the present disclosure.

Referring to FIG. 3, the wireless device 300 corresponds to the

wireless devices

200a and 200b of FIG. 2 and includes various elements, components, units/units, and/or modules. ) can be composed of. For example, the wireless device 300 may include a communication unit 310, a control unit 320, a memory unit 330, and an additional element 340. The communication unit may include communication circuitry 312 and transceiver(s) 314. For example, communication circuitry 312 may include one or

more processors

202a and 202b and/or one or

more memories

204a and 204b of FIG. 2 . For example, transceiver(s) 314 may include one or

more transceivers

206a, 206b and/or one or

more antennas

208a, 208b of FIG. 2. The control unit 320 is electrically connected to the communication unit 310, the memory unit 330, and the additional element 340 and controls overall operations of the wireless device. For example, the control unit 320 may control the electrical/mechanical operation of the wireless device based on the program/code/command/information stored in the memory unit 330. In addition, the control unit 320 transmits the information stored in the memory unit 330 to the outside (e.g., another communication device) through the communication unit 310 through a wireless/wired interface, or to the outside (e.g., to another communication device) through the communication unit 310. Information received through a wireless/wired interface from another communication device can be stored in the memory unit 330.

The additional element 340 may be configured in various ways depending on the type of wireless device. For example, the additional element 340 may include at least one of a power unit/battery, an input/output unit, a driving unit, and a computing unit. Although not limited thereto, the wireless device 300 includes robots (FIG. 1, 100a), vehicles (FIG. 1, 100b-1, 100b-2), XR devices (FIG. 1, 100c), and portable devices (FIG. 1, 100d). ), home appliances (Figure 1, 100e), IoT devices (Figure 1, 100f), digital broadcasting terminals, hologram devices, public safety devices, MTC devices, medical devices, fintech devices (or financial devices), security devices, climate/ It can be implemented in the form of an environmental device, AI server/device (FIG. 1, 140), base station (FIG. 1, 120), network node, etc. Wireless devices can be mobile or used in fixed locations depending on the usage/service.

In FIG. 3 , various elements, components, units/parts, and/or modules within the wireless device 300 may be entirely interconnected through a wired interface, or at least some of them may be wirelessly connected through the communication unit 310. For example, within the wireless device 300, the control unit 320 and the communication unit 310 are connected by wire, and the control unit 320 and the first unit (e.g., 130, 140) are connected wirelessly through the communication unit 310. can be connected Additionally, each element, component, unit/part, and/or module within the wireless device 300 may further include one or more elements. For example, the control unit 320 may be comprised of one or more processor sets. For example, the control unit 320 may be comprised of a communication control processor, an application processor, an electronic control unit (ECU), a graphics processing processor, a memory control processor, etc. As another example, the memory unit 330 may be comprised of RAM, dynamic RAM (DRAM), ROM, flash memory, volatile memory, non-volatile memory, and/or a combination thereof. It can be configured.

Figure 4 is a diagram showing an example of an AI device applied to the present disclosure. As an example, AI devices include fixed devices such as TVs, projectors, smartphones, PCs, laptops, digital broadcasting terminals, tablet PCs, wearable devices, set-top boxes (STBs), radios, washing machines, refrigerators, digital signage, robots, vehicles, etc. It can be implemented as a device or a movable device.

Referring to FIG. 4, the AI device 400 includes a communication unit 410, a control unit 420, a memory unit 430, an input/output unit (440a/440b), a learning processor unit 440c, and a sensor unit 440d. may include.

The communication unit 410 uses wired and wireless communication technology to communicate with wired and wireless signals (e.g., sensor information, user Input, learning model, control signal, etc.) can be transmitted and received. To this end, the communication unit 410 may transmit information in the memory unit 430 to an external device or transmit a signal received from an external device to the memory unit 430.

The control unit 420 may determine at least one executable operation of the AI device 400 based on information determined or generated using a data analysis algorithm or a machine learning algorithm. And, the control unit 420 can control the components of the AI device 400 to perform the determined operation. For example, the control unit 420 may request, search, receive, or utilize data from the learning processor unit 440c or the memory unit 430, and may select at least one operation that is predicted or determined to be desirable among the executable operations. Components of the AI device 400 can be controlled to execute operations. In addition, the control unit 920 collects history information including the user's feedback on the operation content or operation of the AI device 400 and stores it in the memory unit 430 or the learning processor unit 440c, or the AI server ( It can be transmitted to an external device such as Figure 1, 140). The collected historical information can be used to update the learning model.

The memory unit 430 can store data supporting various functions of the AI device 400. For example, the memory unit 430 may store data obtained from the input unit 440a, data obtained from the communication unit 410, output data from the learning processor unit 440c, and data obtained from the sensing unit 440. Additionally, the memory unit 430 may store control information and/or software codes necessary for operation/execution of the control unit 420.

The input unit 440a can obtain various types of data from outside the AI device 400. For example, the input unit 420 may obtain training data for model training and input data to which the learning model will be applied. The input unit 440a may include a camera, microphone, and/or a user input unit. The output unit 440b may generate output related to vision, hearing, or tactile sensation. The output unit 440b may include a display unit, a speaker, and/or a haptic module. The sensing unit 440 may obtain at least one of internal information of the AI device 400, surrounding environment information of the AI device 400, and user information using various sensors. The sensing unit 440 may include a proximity sensor, an illumination sensor, an acceleration sensor, a magnetic sensor, a gyro sensor, an inertial sensor, an RGB sensor, an IR sensor, a fingerprint recognition sensor, an ultrasonic sensor, an optical sensor, a microphone, and/or a radar. there is.

The learning processor unit 440c can train a model composed of an artificial neural network using training data. The learning processor unit 440c may perform AI processing together with the learning processor unit of the AI server (FIG. 1, 140). The learning processor unit 440c may process information received from an external device through the communication unit 410 and/or information stored in the memory unit 430. Additionally, the output value of the learning processor unit 440c may be transmitted to an external device through the communication unit 410 and/or stored in the memory unit 430.

6G 통신 시스템 6G communication system

6G (wireless communications) systems require (i) very high data rates per device, (ii) very large number of connected devices, (iii) global connectivity, (iv) very low latency, (v) battery- The goals are to reduce the energy consumption of battery-free IoT devices, (vi) ultra-reliable connectivity, and (vii) connected intelligence with machine learning capabilities. The vision of the 6G system can be four aspects such as “intelligent connectivity”, “deep connectivity”, “holographic connectivity”, and “ubiquitous connectivity”, and the 6G system can satisfy the requirements as shown in Table 1 below. In other words, Table 1 is a table showing the requirements of the 6G system.

At this time, the 6G system includes enhanced mobile broadband (eMBB), ultra-reliable low latency communications (URLLC), massive machine type communications (mMTC), AI integrated communication, and tactile communication. tactile internet, high throughput, high network capacity, high energy efficiency, low backhaul and access network congestion, and improved data security. It can have key factors such as enhanced data security.

- 인공 지능(artificial Intelligence, AI)- Artificial Intelligence (AI)

The most important and newly introduced technology in the 6G system is AI. AI was not involved in the 4G system. 5G systems will support partial or very limited AI. However, 6G systems will be AI-enabled for full automation. Advances in machine learning will create more intelligent networks for real-time communications in 6G. Introducing AI in communications can simplify and improve real-time data transmission. AI can use numerous analytics to determine how complex target tasks are performed. In other words, AI can increase efficiency and reduce processing delays.

Time-consuming tasks such as handover, network selection, and resource scheduling can be performed instantly by using AI. AI can also play an important role in M2M, machine-to-human and human-to-machine communications. Additionally, AI can enable rapid communication in BCI (brain computer interface). AI-based communication systems can be supported by metamaterials, intelligent structures, intelligent networks, intelligent devices, intelligent cognitive radios, self-sustaining wireless networks, and machine learning.

Recently, attempts have been made to integrate AI with wireless communication systems, but these are focused on the application layer and network layer, and in particular, deep learning is focused on wireless resource management and allocation. come. However, this research is gradually advancing to the MAC layer and physical layer, and attempts are being made to combine deep learning with wireless transmission, especially in the physical layer. AI-based physical layer transmission means applying signal processing and communication mechanisms based on AI drivers, rather than traditional communication frameworks, in terms of fundamental signal processing and communication mechanisms. For example, deep learning-based channel coding and decoding, deep learning-based signal estimation and detection, deep learning-based MIMO (multiple input multiple output) mechanism, It may include AI-based resource scheduling and allocation.

Additionally, machine learning can be used for channel estimation and channel tracking, and can be used for power allocation, interference cancellation, etc. in the physical layer of the DL (downlink). Machine learning can also be used for antenna selection, power control, and symbol detection in MIMO systems.

However, application of DNN for transmission in the physical layer may have the following problems.

Deep learning-based AI algorithms require a large amount of training data to optimize training parameters. However, due to limitations in acquiring data from a specific channel environment as training data, a lot of training data is used offline. This means that static training on training data in a specific channel environment may result in a contradiction between the dynamic characteristics and diversity of the wireless channel.

Additionally, current deep learning mainly targets real signals. However, signals of the physical layer of wireless communication are complex signals. In order to match the characteristics of wireless communication signals, more research is needed on neural networks that detect complex domain signals.

Below, we will look at machine learning in more detail.

Machine learning refers to a series of operations that train machines to create machines that can perform tasks that are difficult or difficult for humans to perform. Machine learning requires data and a learning model. In machine learning, data learning methods can be broadly divided into three types: supervised learning, unsupervised learning, and reinforcement learning.

Neural network learning is intended to minimize errors in output. Neural network learning repeatedly inputs learning data into the neural network, calculates the output of the neural network and the error of the target for the learning data, and backpropagates the error of the neural network from the output layer of the neural network to the input layer to reduce the error. ) is the process of updating the weight of each node in the neural network.

Supervised learning uses training data in which the correct answer is labeled, while unsupervised learning may not have the correct answer labeled in the training data. That is, for example, in the case of supervised learning on data classification, the learning data may be data in which each training data is labeled with a category. Labeled learning data is input to a neural network, and error can be calculated by comparing the output (category) of the neural network with the label of the learning data. The calculated error is backpropagated in the reverse direction (i.e., from the output layer to the input layer) in the neural network, and the connection weight of each node in each layer of the neural network can be updated according to backpropagation. The amount of change in the connection weight of each updated node may be determined according to the learning rate. The neural network's calculation of input data and backpropagation of errors can constitute a learning cycle (epoch). The learning rate may be applied differently depending on the number of repetitions of the learning cycle of the neural network. For example, in the early stages of neural network training, a high learning rate can be used to ensure that the neural network quickly achieves a certain level of performance to increase efficiency, and in the later stages of training, a low learning rate can be used to increase accuracy.

Learning methods may vary depending on the characteristics of the data. For example, in a communication system, when the goal is to accurately predict data transmitted from a transmitter at a receiver, it is preferable to perform learning using supervised learning rather than unsupervised learning or reinforcement learning.

The learning model corresponds to the human brain, and can be considered the most basic linear model. However, deep learning is a machine learning paradigm that uses a highly complex neural network structure, such as artificial neural networks, as a learning model. ).

Neural network cores used as learning methods are broadly divided into deep neural networks (DNN), convolutional deep neural networks (CNN), and recurrent Boltzmann machine (RNN). And this learning model can be applied.

Shannon and Weaver explain communication by dividing it into three stages. Step 1 is a question of whether symbols for communication are accurately transmitted from a technical aspect, and step 2 is a question of how accurately the transmitted symbols convey the correct meaning from a semantic aspect. The third level is effectiveness, a question of how effectively the received meaning influences operation in the right way. Figure 5 shows an example of a communication model divided into three stages.

One of the many goals of 6G communications is to provide services that can interconnect humans and machines. As one of the next-generation wireless communication paradigms for this purpose, semantic communication based on the concept of "transferring meaning" has emerged. In existing communication, the receiver (e.g. destination) decodes the encoded signal received from the transmitter (e.g. source) into an existing signal without error. ) focuses on communicating by doing. On the other hand, semantic communication focuses on the meaning that is intended to be conveyed through signals, such as when people exchange information through the ‘meaning’ of words when communicating.

The core of semantic communication is to extract the “meaning” of the information transmitted at the transmitting end. Semantic information can be successfully “interpreted” at the receiving end based on a consistent knowledge base (KB) between the source and destination. Accordingly, even if there is an error in the signal, if the operation is performed according to the meaning intended to be conveyed through the signal, correct communication has been performed. Therefore, in semantic communication, it is necessary to access whether the downstream task located at the destination is performed as intended in the signal (e.g., representation) transmitted from the source. Additionally, when the destination performs an inference operation using a signal transmitted from the source, it interprets the meaning (e.g., the purpose of the downstream task) transmitted by the source based on the background knowledge it possesses. Accordingly, in order for the destination to perform an operation according to the meaning conveyed by the source based on the results obtained through reasoning using the signal transmitted from the source, the background knowledge contained in the signal transmitted from the source is the background knowledge of the destination. It must be able to be reflected (updated) in . To achieve this, the transmitted signal must be generated considering the downstream task located at the destination. Such a task-oriented semantic communication system can provide the advantage of preserving task relevant information while introducing useful invariance to downstream tasks.

Referring to FIG. 6, the operation of the transmitting end 610 and the receiving end 620 for semantic communication can be confirmed. World model

Shannon entropy of

Can be expressed as Equation 1 below. Shannon entropy may be the model entropy of a semantic source.

[Equation 1]

world model

is a probability distribution

It is a set of interpretations,

is the model distribution. At this time,

The model for which x is true

If it is a set of its models, the logical probability m(x) of message x can be expressed as Equation 2 below.

[Equation 2]

Semantic entropy of message x

Can be expressed as Equation 3 below.

[Equation 3]

At this time, when background knowledge k is considered, the set of possible worlds in Equation 2 and Equation 3 may be limited to a set compatible with k. Therefore, it can be expressed as a conditional logical probability as shown in

Equations

4 and 5 below.

[Equation 4]

[Equation 5]

As an example, Table 2 below illustrates a truth table where p is statistical probabilities and background knowledge is k. Specifically, Table 2 is an example of a truth table where p(A)=p(B)=0.5 and K={A->B}.

According to Table 2, possible worlds can be reduced to a set of truth assignment values (

e.g. cases

1, 2, and 4 in Table 1) where A->B is true. Therefore, conditional logical probabilities such as

Equations

6, 7, and 8 below can be obtained.

[Equation 6]

[Equation 7]

[Equation 8]

Logical probabilities are different from a priori statistical probabilities because they are based on background knowledge, and in the new distribution, A and B are no longer logically independent (

).

Meanwhile, the new distribution of the model set when background knowledge k exists is

Can be expressed as

equations

9 and 10 below.

[Equation 9]

[Equation 10]

Equation 11 below represents the entropy of the source without considering background knowledge, and Equation 12 below represents the model entropy of the source considering background knowledge.

[Equation 11]

[Equation 12]

As in Equation 11 and Equation 12, the source can compress the message it wants to convey without omitting information through shared background knowledge. In other words, the source and destination can transmit and receive maximum information with a small data volume through shared background knowledge. One of the main reasons why communication at the semantic level can improve performance compared to the existing technical level is because background knowledge is taken into account. Therefore, the present disclosure proposes a method for generating and transmitting and receiving signals in consideration of background knowledge to be suitable for downstream tasks located at the destination in order to perform semantic communication.

According to an embodiment of the present disclosure, a semantic layer, a new layer that manages overall operations on semantic data and messages, may be added. The semantic layer is a layer for a task-oriented semantic communication system and can be used to generate and transmit and receive signals between the source and destination. In order to communicate through the semantic layer, it may be necessary to define a protocol, which is a protocol between layers, and a series of operation processes, which are described below.

Meanwhile, the majority of raw data held or collected by sources in an actual communication environment is unlabeled data (hereinafter referred to as ‘unlabeled data’). At this time, performing labeling on unlabeled data may incur additional costs. Therefore, contrastive learning, an artificial intelligence (AI)/machine learning (ML) technology, can be used as a technology that can perform communication using the unlabeled data itself. In the following, contrast learning, a technology that can be applied to semantic systems, is described. As an example, contrast learning can be introduced into the semantic layer to perform semantic communication.

Contrast learning is a method of learning correlations between data through representation space. Specifically, through contrast learning, high-dimensional data can be changed to low-dimensional data (e.g., dimension reduction) and placed in the expression space. Afterwards, the similarity between data can be measured based on the location information of each data located in the expression space. For example, through contrast learning, a semantic communication system can learn positive pair expressions to be located close to each other, and negative pair expressions to be located far away from each other. A positive pair is a pair of similar data, and a negative pair is a pair of dissimilar data. Contrast learning can be applied to both supervised-learning and unsupervised-learning, but it can be especially useful when learning is performed using unsupervised data without labeled data. Therefore, contrastive learning is suitable for building a task-oriented semantic communication system in a real environment where unlabeled data accounts for the majority.

As an example, Figure 7 shows a case where contrast learning is performed based on a giraffe image. However, this is only an example for convenience of explanation and may not be limited to the above-described embodiment. Referring to FIG. 7, you can see the contrast learning operation performed when the target task is a classification task and the data modality is an image. The standard query for classifying image data is the giraffe image. Representations of giraffe images can be learned to be located close to the query's representation, and representations for images other than giraffe images can be learned to be located far from the query's representation. In other words, the contrastive learning technique trains the encoder so that data that is similar to the reference data is mapped nearby, and data that is not similar to the reference data is mapped far away.

Figure 8 shows an example of instance identification 800 for contrastive learning according to an embodiment of the present disclosure. A model that performs contrastive learning can learn data through instance discrimination (800).

An instance refers to each of the data samples being trained. As an example, an instance may be a sample of image data of a specific size or a sample of text data in sentence units. Instance identification involves classifying data by determining each class of all instances included in the entire data set. Therefore, if there are N instances, N identification operations can be performed. Instance identification learns the differences between instances based on the similarity between them, providing the advantage of obtaining useful expressions for data without labeling information. If downstream tasks are performed using the expression learned through instance identification, the model's performance can be improved as if a supervised learning method was performed.

Meanwhile, for instance identification, as the number of data samples increases, the amount of identification work increases significantly. For example, if there are 10 million data samples, 10 million identification operations may be performed. Therefore, as the number of data samples increases, the denominator for softmax calculation for probability calculation increases and the probability value decreases, making learning difficult. To solve this problem, noise-constrative estimation (NCE) can be used as an appropriate approximation calculation method. The multi-class classification operation can be changed to a binary classification operation that determines whether it is a data sample or a noise sample through NCE.

In order for NCE to be performed, a comparison method is defined for a reference sample to determine whether any sample is a similar sample (positive sample) (hereinafter referred to as 'positive sample') or a dissimilar sample (negative sample) (hereinafter referred to as 'negative sample'). It is necessary to do One method for generating positive samples is data augmentation (hereinafter referred to as 'augmentation'). Augmentation is creating new data by modifying existing data. From a semantic perspective, augmented data (hereinafter referred to as 'augmentation data') contains the same meaning as the meaning that the existing data is intended to convey. In other words, the information included in the existing data and augmentation data is the same. Therefore, the representations of existing data and augmentation data should be similar. Therefore, existing images and augmentation data can be defined as positive samples, and all non-positive samples can be defined as negative samples.

Referring to Figure 9, you can see the results of performing augmentation on the dog image. For example, data can be augmented by cropping, resizing, flipping, changing color, or rotating a portion of the image data. .

For contrastive learning, the NCE loss function of Equation 13 below can be used.

[Equation 13]

In Equation 13, x is the reference data (query data),

is data related to data or data similar to x,

is data that is unrelated to the reference data or data that is not similar to x.

As described above, contrastive learning techniques provide the advantage of learning useful representations from the unlabeled data itself. Therefore, the contrastive learning technique can be applied to semantic communication as an AI/ML technology of an encoder that performs semantic source coding. Additionally, background knowledge possessed by the source and destination must be appropriately utilized so that a representation based on the embedding space can be created from the data. Additionally, information about the positive samples and negative samples from which the model learns needs to be updated in the background knowledge of the source and the background knowledge of the destination. On the other hand, contrastive learning may have the following problems.

Referring to Figure 10, when the target task is a classification task performed based on a dog image and the data modality of the input is an image, class conflicts that may occur when contrast learning is performed collision) problem can be confirmed. If the classification task is successfully performed after contrast learning is performed, the representations of dog images (1012, 1022, 1032) are located close to each other, and the representations of images for objects other than dogs (1042, 1044, 1052) ) should be located far from the

representations

1012, 1022, 1032 of dog images. However, looking at FIG. 10, while the representation 1022 of the positive sample 1020 generated by performing augmentation on the anchor image 1010 is located close to the representation 1012 of the anchor image, It can be seen that the representation 1032 of another sample 1030, which is not the same as the anchor image 1010 but is a dog image, is located far from the anchor image 1012. That is, although the other sample 1030, which is a dog image, should be classified with the same label as the anchor image 1010, the other sample 1030, which is a dog image, is classified as a negative sample and is located far from the expressions of the dog image. Accordingly, when contrastive learning is used during learning, a problem may arise where the output resulting from the downstream task operation located at the destination is different from the result intended by the source. In addition, contrastive learning may have the problem of performing incorrect learning by ignoring the label and learning the data, even if the data contains label information, which is the correct answer to be obtained through downstream task operations. Therefore, for the accuracy of learning, if there is a label in the data, there is a need to perform contrastive learning by taking this into consideration.

Accordingly, this disclosure proposes a framework and related procedures for a semantic communication system utilizing weakly-supervised contrastive learning. Through this, problems that may occur when performing contrast learning can be corrected. The source learns expressions using the acquired data and delivers them to the destination, and the destination downloads the received expressions as intended by the source without restoring them. Stream tasks can be performed.

Additionally, according to the present disclosure, the data may include auxiliary information (e.g., data attributes, hashtags for images on Instagram, and metadata such as the individual's identity and nationality). , text description of an image, etc.), the expression learning performance of the source and the operation performance of the downstream task located at the destination can be improved by performing contrastive learning using auxiliary information.

In addition, the semantic representation generated by improving the expression learning performance of the source is updated with the background knowledge of the source and the background knowledge of the destination, so that reasoning performance using background knowledge can also be improved.

The framework proposed in this disclosure may include a pre-training operation for semantic source coding, and a training operation for downstream tasks of the destination. Here, semantic source coding is an operation in which the source generates a signal (eg, representation) to be transmitted to the destination. Through the present disclosure, a transmission/reception signal can be generated considering the downstream task to be performed at the destination, and the downstream task can be performed as intended by the source. At this time, the source and destination can share background knowledge. Once pre-training and learning for downstream tasks are completed, inference can be performed.

Meanwhile, the present disclosure may be applied to a signal transmission/reception protocol using a semantic layer that can be newly added in a task-oriented semantic communication system, but is not limited thereto, and may be applied to a framework for task-oriented semantic communication using contrastive learning. and related procedures.

Figure 11 shows an example framework for dictionary learning according to an embodiment of the present disclosure. The framework for pre-learning may be composed of the operations of the source 1110 and the destination 1120. At this time, the transform head 1150 may be used as one of the encoding models. Steps S1101 to S1107 described below are operations performed at the source, and steps S1109 and S1111 are operations performed at the destination. Here, pre-learning can be performed in mini-batch units.

Referring to FIG. 11, in step S1101, the source 1110 may obtain semantic data 1114 from raw data 1112. Semantic data 1114 is data extracted from raw data 1112. Semantic data 1114 can be used to generate a message (e.g., expression) containing ‘meaning’ information that the source 1110 wants to convey to the destination 1120. At this time, the acquisition unit of the semantic data 1114 may be determined using the

background knowledge

1130 and 1140 held by the source 1110 and the destination 1120.

As an example, as shown in FIG. 12, when the background knowledge includes a biomedicine knowledge graph and the source obtains semantic data in query format from raw data, the 'corresponding biomedicine field' is based on the biomedicine knowledge graph. Semantic data acquisition units such as 'query related to', 'type of query', and 'length of query' may be determined. As another example, when the source acquires semantic data in text format from raw data, the semantic data acquisition unit, such as whether to transmit data in sentence units or paragraph units, is based on background knowledge related to text data. can be set.

In step S1103, the source 1110 may perform weekly labeling on the semantic data 1114. Weekly labeling may be an operation that performs a classification task on data using auxiliary information. In other words, when weekly labeling is performed, a weak label may be assigned to the data. As an example, if auxiliary information exists in raw data or background knowledge data built based on raw data, the source 1110 may perform weekly labeling on the semantic data 1114 using the auxiliary information. At this time, information included in the background knowledge can be used. The method of performing weekly labeling is explained in detail in FIGS. 13 and 14 below. Weekly labeling may have noise compared to general labeling, but it has the advantage of being very low cost compared to performing annotation work by humans.

13 and 14 illustrate examples of weekly labeling based on auxiliary information according to an embodiment of the present disclosure.

According to one embodiment, the auxiliary information may be data's discrete attribute (hereinafter referred to as 'discrete attribute'). That is, discrete attributes may be selected as auxiliary information used to perform weekly labeling. For example, for a person image, a binary indicator, which is an example of a discrete attribute such as 'short/long hair' and 'short sleeves/long sleeves', can be used as auxiliary information.

Figure 13 shows an example of weekly labeling based on data attributes according to an embodiment of the present disclosure. Referring to FIG. 13, you can see the weekly labeling process performed when the data modality is an image. The types of data attributes can vary, and the number of data attributes handled by the source and destination can be determined through initial settings. First, the source can determine the ranking by measuring the entropy of data attributes. Afterwards, the source may construct an attribute set containing the top k attributes with high measured entropy values. A high entropy value for an attribute means that the attribute is distributed in various ways. And the source can create a cluster by performing clustering on the attribute set so that the attributes within the same cluster have the same result value. That is, the source can configure attribute sets using auxiliary information, and configure data included in clusters within each attribute set to have the same value. At this time, because the entropy of the attribute may vary depending on the background knowledge update, entropy measurement must be performed in mini-batch units, which are learning units. Information related to the created cluster can be used to perform weekly labeling.

According to another embodiment, the auxiliary information may be data's hierarchy information (hereinafter 'hierarchy information'). That is, layer information can be selected as auxiliary information used to perform weekly labeling.

Figure 14 shows an example of weekly labeling based on data layer information according to an embodiment of the present disclosure. Referring to Figure 14, you can see the weekly labeling operation performed when the data modality is text and the background knowledge includes WordNet. First, the source can form data into a hierarchical tree structure based on auxiliary information. At this time, each node at each level in the tree structure may correspond to a corresponding coarse label. Afterwards, the source

You can form a node set by selecting nodes corresponding to the th level. At this time, since the tree structure according to the hierarchy may change depending on the background knowledge update, the source is divided into mini-batch units, which are learning units.

You can select nodes corresponding to the second level. Additionally, the source may create clusters by performing clustering on a set of nodes so that data included in the same cluster has the same coarse label. Information related to the created cluster can be used to perform weekly labeling.

At this time, the discrete attributes and hierarchical information of data are only examples of auxiliary information used to perform weekly labeling, and the auxiliary information may not be limited thereto. That is, other auxiliary information (e.g., an asymmetric attribute, which is an attribute with unequal importance) may be used for weekly labeling.

Meanwhile, when the source performs clustering based on auxiliary information, hyper-parameter values (e.g., k value if the auxiliary information is a data attribute, and if the auxiliary information is hierarchical information,

Depending on the value, the operational performance (e.g., interpretation performance of the expression received from the source of the destination) of the downstream task located at the destination may be determined. Therefore, hyperparameter values need to be determined according to background knowledge (e.g., domain knowledge). In addition, when background knowledge is updated, existing hyperparameter values may change due to changes in properties of auxiliary information (e.g., entropy of discrete attributes of data, change in tree structure according to data hierarchy information, etc.), so during pre-learning Hyperparameter values may change.

In step S1105, the source 1110 may perform augmentation on the weekly labeled semantic data 1116. Weekly labeled semantic data 1116 may be data included in a cluster created by performing weekly labeling on semantic data. Augmentation is a technique used to increase the overall parameters of data by transforming data to create new data. As an example, the source 1110 may augment the weekly labeled semantic data 1116 to generate positive samples necessary for contrast learning. At this time, if the obtained semantic data is N mini-batch, 2N pieces of augmentation data can be generated.

The type of augmentation may vary depending on the modality of the data. [Table 3] below illustrates the types of augmentation when the data modality is an image.

CategoryCategory	Type Type
Geometric TransformationsGeometric Transformations	Flipping, Cropping, Rotation, Color space, Noise Injection 등을 이용한 변형Transformation using flipping, cropping, rotation, color space, noise injection, etc.
Color space TransformationColor space transformation	R, G, B 값 중에서 하나의 값을 최소값으로 만들거나 최댓값으로 맞추는 등의 조정을 통해 광도를 조절Adjust the brightness by adjusting one of the R, G, and B values to the minimum or maximum value.
Kernel FilterKernel Filter	Gaussian Filter, Edge Filter, Patch shuffle filter 등을 이용하여 의 크기로 영역의 픽셀들을 랜덤하게 믹싱(mixing)Using Gaussian Filter, Edge Filter, Patch shuffle filter, etc. Randomly mixing pixels in an area with a size of
Random ErasingRandom Erasing	이미지의 특정 부분을 랜덤하게 삭제하여 새로운 이미지를 생성Create a new image by randomly deleting certain parts of the image
Mixing ImagesMixing Images	복수의 이미지 각각의 일부분들을 이용하여 새로운 이미지를 생성Create a new image using parts of each image

[Table 4] below illustrates an augmentation technique when the data modality is text.

CategoryCategory	Sub-categorySub-category	TypeType
Text modificationText modification	Random Noise InjectionRandom Noise Injection	Synonym Replace(SR), Random Insertion(RI), Random Swap(RS), Random Deletion(RD)Synonym Replace(SR), Random Insertion(RI), Random Swap(RS), Random Deletion(RD)
Text generationText generation	Back-TranslationBack-Translation	번역기를 사용하여 단일 언어 데이터(monolingual data)로부터 인공 데이터를 생성 - Beam Search, Random Sampling, Top-10 Sampling, Beam + NoiseGenerate artificial data from monolingual data using a translator - Beam Search, Random Sampling, Top-10 Sampling, Beam + Noise
Text generationText generation	Pre-trained 모델을 이용한 conditional Pre-trainingConditional Pre-training using a Pre-trained model	Pre-trained 모델 3가지(Auto-Regressive(AR), Auto-Encoder(AE), Sequence-to-sequence(Seq2Seq))를 이용하여 text를 augmentation - Pre-trained 모델로 레이블 정보를 포함시켜 fine-tuning을 수행Augmentation of text using three pre-trained models (Auto-Regressive (AR), Auto-Encoder (AE), and Sequence-to-sequence (Seq2Seq)) - Perform fine-tuning by including label information in a pre-trained model
그 외etc	Dropout noiseDropout noise	동일한 문장에 기초하여 dropout mask만 바꾸어 embedding이 유사한 positive pair를 생성Based on the same sentence, only the dropout mask is changed to generate positive pairs with similar embeddings.

[Table 5] below illustrates an augmentation technique when the data modality is a graph.

CategoryCategory	Sub-categorySub-category	TypeType
Topology(structure) augmentationTopology(structure) augmentation	Edge perturbationEdge perturbation	Edge Removing(ER), Edge Adding(EA), Edge Flipping(EF)Edge Removing(ER), Edge Adding(EA), Edge Flipping(EF)
	Node perturbationNode perturbation	Node Dropping(ND)Node Dropping(ND)
	Subgraph sampling(SS)Subgraph sampling (SS)	Subgraph induced by Random Walks(RWS)Subgraph induced by Random Walks(RWS)
	Graph Diffusion(GD)Graph Diffusion(GD)	Diffusion with Personalized PageRank(PPR), Diffusion with Markov Diffusion Kernels[MDK]Diffusion with Personalized PageRank(PPR), Diffusion with Markov Diffusion Kernels[MDK]
Feature augmentationFeature augmentation		Feature Masking[FM], Feature Dropout[FD]Feature Masking[FM], Feature Dropout[FD]

Meanwhile, the type of augmentation applied may affect the semantic source coding performance of the encoder 1118. For example, if the modality of the data transmitted by the source 1110 is text and the downstream task located at the destination distinguishes whether it is a positive or negative sentence, the meaning that the source 1110 wants to convey is determined according to the grammatical elements of the text. The operation may not be performed. Therefore, in order to preserve the meaning to be conveyed through text data, the type of augmentation and the ratio of augmentation must be set based on the background knowledge 1130.

Referring to Figure 15, it can be seen that the performance of edge perturbation for NCI1, which is chemical substance-related biochemical molecule data, is deteriorated compared to COLLAB, which is social network data. . This means that a change in the edge in biomolecule data such as NCI1 corresponds to the removal or addition of a covalent bond, and the identity and validity of the compound can be significantly changed, and source 1110 This indicates that the meaning intended to be conveyed to the destination 1120 may not be conveyed correctly. Therefore, in order to not perform augmentation such as edge perturbation on data such as NCI1, the source 1110 or the destination 1120 can set the data augmentation type using the background knowledge 1130. Additionally, it can be seen through Figure 15 that performance is determined depending on the perturbation ratio. Therefore, the application rate of data augmentation also needs to be set using the background knowledge 1130.

Meanwhile, the source 1110 may generate augmentation data 1118 by combining a plurality of augmentation techniques to improve system performance. For example, when the data modality is an image, the source 1110 combines all four augmentation techniques: crop, flip, color jitter, and grayscale to store the data. It can be augmented. Additionally, source 1110 may augment data using multiple augmentation techniques belonging to different categories. In fact, when the data modality is a graph, the performance of the system improves when similar samples are generated using multiple augmentation techniques contained in multiple categories compared to applying an augmentation technique contained in a single category. improved. Additionally, the combination of augmentation techniques that achieves the best performance varies depending on the domain of the data. In other words, the type and rate of augmentation must be set based on the possessed background knowledge 1130 (e.g., domain knowledge) according to the data modality.

In step S1107, the source 1110 may perform encoding on the weekly labeled augmentation data 1116. At this time, an appropriate encoder 1118 can be used depending on the data modality. For example, if the data modality is an image, a CNN-based model (e.g., ResNet18) may be used, and if the data modality is text, a pre-trained model (e.g., BERT) may be used. For example, the encoder 1118 located in each dual-branch may be the same. Additionally, when using an existing model as the encoder 1118, only the configuration for feature extraction can be used among the configurations of the encoder 1118. Here, the construct for feature extraction can be used to obtain the representation. The source 1110 may transmit the result (hereinafter referred to as ‘encoding data’) (e.g., representation) generated by performing encoding to the destination 1120. At this time, the source 1110 may transmit cluster information corresponding to the weak label along with the encoded data. Here, the encoded data can be viewed as a semantic message created using semantic data in semantic communication.

Meanwhile, in step S1109, the destination 1120 may perform an additional operation of converting the format of the encoded data according to the format of the data used to perform the downstream task. Figure 16 shows an example of an additional data conversion operation when the data modality is a graph. Referring to FIG. 16, when encoding of data is performed, the output may be output as a node representation (node representation) 1610. At this time, the destination (e.g., destination 1120 in FIG. 11) may decide whether to perform additional operations depending on the operation method of the downstream task. If the downstream task is an operation performed using the node expression 1610, the destination may not perform additional operations. On the other hand, if the downstream task is an operation performed using a graph representation, the destination can perform an additional operation to convert the node representation to a graph representation. At this time, the destination may perform additional operations through a set readout function 1620 (e.g., average, sum).

As another example, Figure 17 shows an example of an additional data conversion operation when the data modality is text. Referring to FIG. 17, text data may be encoded through a free trained model (eg, BERT). And, as a result of encoding, a word vector set, which is an expression in word units, can be output. The destination can decide whether to perform additional actions depending on how the downstream task operates. If the downstream task is an operation performed using a word expression, the destination may not perform additional operations. On the other hand, if the downstream task is an operation performed using a context vector, which is a context-based expression, the destination performs a pooling operation (e.g. mean, max) to create a word vector. can be converted to a context vector.

As another example, when the data modality is an image, local feature vectors may be output from each branch as an encoding result, and the destination is a global summary vector from one of the paths. Additional operations can be performed to create a summary vector. At this time, the model can generate a global summary vector in a similar way to using the readout function when the data modality is a graph.

As in the above embodiments, task-oriented semantic communication can be performed by additional operations performed to obtain an expression suitable for the purpose of a downstream task located at the destination. Through this, flexibility can be granted to the semantic communication system. At this time, the additional operations in step S1109 can be learned by forming a multi-layer perceptron (MLP).

When step S1109 is completed, in step S1111, the destination 1120 can learn encoded data (eg, representation) using a loss function. In the following, the transform head 1150 used for learning is described.

Figure 18 shows an example of the configuration of a transform head 1800 according to an embodiment of the present disclosure. The transform head 1800 is an example of an encoder for a semantic communication system (eg, the transform head 1150 in FIG. 10).

Referring to FIG. 18, the transform head 1800 includes at least one dance layer (

dense layer

1811, 1813, 1815) and at least one non-linear function through a projection head technique. It may include ReLu (rectified linear unit) (1812, 1814) corresponding to. The structure of the transform head 1800 is not limited to that of FIG. 18, and the number of layers and non-linear function may vary depending on the encoder model. The reason for configuring the transform head 1800 as shown in FIG. 18 is as follows.

SimCLR-based models calculate loss using a non-linear projection head. In this case, the performance is better than when a linear projection head or no projection head is used. In addition, the SimCLRv2-based model performs learning by increasing the size of the encoder model and increasing the number of linear layers that make up the projection head. This is because the lower the label fraction and the more layers of the projection head, the better the performance. Accordingly, the present disclosure proposes a transform head configured as illustrated in FIG. 18 as an encoding model for maximizing the performance of semantic communication through effective embedding learning.

Thereafter, in step S1111, the destination 1120 may learn encoded data (e.g., representation) using a loss function. At this time, learning can be performed using CI-InfoNCE loss (clustering InfoNCE loss). CI-InfoNCE loss is an extension of Equation 13, which learns similar representations for data with the same weak label (e.g., data located within the same cluster) and data with different weak labels (e.g., data located within different clusters). ) is a Loss function that allows learning other expressions. Here, learning a similar expression may mean learning to increase the distance between expressions, and learning a different expression may mean learning to increase the distance between expressions. CI-InforNCE loss is given in Equation 14 below.

[Equation 14]

In Equation 14, the variable expressed in uppercase letters is a random variable, the variable expressed in lowercase letters is the outcome of the random variable, and X and Y are expressions generated from weekly labeled augmentation data ( representation) (e.g., the result of the encoder located at the source, the result after performing additional operations at the destination), and Z is information about the cluster configured to perform weekly labeling (hereinafter 'cluster information'). Cluster information is passed from the source to the destination along with the representation that is the output of the encoder, and can be used at the destination to calculate the loss function. In equation 14:

is any function that returns a scalar from the input (x, y),

For , cosine similarity can be considered as shown in Equation 15 below.

[Equation 15]

In equation 15:

is a neural network located at the destination,

is a hyper-parameter that serves as a criterion for classification as positive or negative between classes. usually of large value

is a strict reference point to make classification between classes easier, and has a small value.

forms hard positives/hard negatives, making learning more difficult, but can help with robust classification.

In equation 14:

silver

There are n independent copies of . In Equation 14, first cluster z ~

is sampled, and x ~

and y~

The (x,y) pair of can be sampled. also,

Is

and

In positive pair representation, as the case where has the same weak label,

Is

and

has a different weak label and can be called a negative pair representation. Since learning is performed in the form of mini-batches, positive/negative samples can be considered using the weak label passed along with the representation within the batch.

At this time, if the weak label (e.g. cluster) Z generated in Equation 14 is set to be the same as the instance ID, a self-supervised contrast setting (self- It can be a supervised contrastive setting. Here, setting the weak label Z equal to the instance ID may mean that each cluster has only one instance. Additionally, if the generated weak label Z is composed of a label that is the correct answer to be obtained through a downstream task operation, it can be a supervised contrastive setting.

The goal of learning is to learn the expressions X and Y that are delivered to the destination in order to maximize Cl-InfoNCE in Equation 14. From an information theory perspective, maximizing Cl-InfoNCE involves learning to include weak label information (e.g., clustering information), as shown in Equation 16 below.

[Equation 16]

In equation 16,

Z is the entropy of the cluster constructed for weekly labeling,

and

is the conditional entropy of Z given X and Y. Through equation 16

If is high, it is easy to tell whether (x, y) are the same weak label. When CI-InfoNCE is maximized, expressions X and Y including clustering information Z, which is weak label information, can be generated.

As described above, through the pre-learning process, whether the expressions have been well learned to perform an operation suitable for the purpose of the downstream task located at the destination, and for cross validation, it is obtained through Z and the downstream task operation. You can check the relationship between the labels that are the correct answer (e.g., the downstream label represented by T). Mutual information

Measures the relationship of the cluster information and label, which is the weak label, and the conditional entropy,

Measures the amount of redundant information in a cluster that is not related to downstream labels. For example, when there is high mutual information and low conditional entropy between the weak label and the downstream label using auxiliary information, improved performance of the downstream task can be expected based on the learned expression.

Referring to Figure 19 in relation to performing weekly labeling using auxiliary information, as k increases, the mutual information

increases, but the conditional entropy

You can also see an increase. This means that as more attributes are considered, a cluster, which is a weak label that has more correlation with the downstream label, is created, but may contain more information unrelated to the downstream label. This can be seen in the fact that performance increases and then decreases as the number (k) of selected attributes increases when looking at the Top-1 Accuracy in Figure 19. Therefore, for weekly labeling, it is necessary to select the attributes that are judged to be most informative (e.g., attributes with high entropy). Also, looking at Figure 19,

and -

It can be seen that weekly labeling is performed with the best performance at the intersection of .

Referring to FIG. 20 in relation to performing weekly labeling using the hierarchical level of Wordnet data, it can be seen in FIG. 20(a) that increasing the hierarchical level improves performance. And in Figure 20(b), the mutual information between the weak label cluster Z and the downstream label T

It can be seen that is increasing. Also, referring to Figure 20(c),

is maintained at 0 because the coarse level (e.g., intermediate node) is determined by the downstream labels (e.g., leaf nodes in the tree structure) in the tree hierarchy structure. You can check it. At this time, if the created week label (e.g. cluster) Z is set equal to the instance ID,

Although it is high,

You can see that it is also high. This means that the performance of self-supervised contrast learning has been reduced. Therefore, as seen in Figures 19 and 20, when using the clustering-based expression learning method for weekly labeling,

and

must be considered.

In addition, since the label T, which is the correct answer to be obtained through the downstream task operation, is information that only the destination can know, the source transmits information about Z to the destination, and the destination that receives the information about Z provides mutual information and Conditional entropy can be calculated. Accordingly, depending on the type of auxiliary information in the source, k or hierarchical level of data related to selection of top k attributes

It may be necessary to select parameters such as and transfer them from the destination to the source.

Meanwhile, the method of generating negative samples used for learning has a significant impact on learning performance. A hard negative is a sample that has similar data but is predicted to be dissimilar data. The performance of the semantic communication system can be improved as the destination learns hard negative samples extracted using background knowledge together with the delivered samples.

In addition, the destination uses not only the representation information for the data delivered from the source but also weak label information (e.g., cluster information) to update background knowledge, and can perform reasoning using background knowledge in relation to downstream task operations. You can. As the source passes expression and weak label information to the destination, the destination determines the samples to be negative samples or positive samples by passing a path (e.g., the path in FIG. 11) to the same weak sample as the determined positive sample. A sample with a label can be determined as a positive sample. Accordingly, since the number of samples determined as positive samples increases, the number of samples determined as negative samples can be reduced. Table 6 below shows 13 samples when the batch size is 6144 and the epoch is 350. When present, it indicates top-1 accuracy according to the number of positive samples. As can be seen in Table 6 below, when the number of positive samples is increased, there is an advantage in top-1 accuracy. In Table 6, when the number of positive samples is 1, it is a self-supervised approach in which the positive sample is an augmented version of the same sample.

1One	33	55	77	99	No cap(13)No cap(13)
69.369.3	76.676.6	78.078.0	78.478.4	78.378.3	78.578.5

That is, the source and destination can update the background knowledge by reflecting positive/negative samples and weak labels (e.g., cluster information) in the background knowledge. In this way, the background knowledge included in the data transmitted from the source to the destination is reflected in the background knowledge of the destination, so that the source and destination can share background knowledge.

When the pre-learning shown in FIG. 11 is completed, learning to perform a downstream task at the destination can be performed, and when learning is completed, inference can be performed. At this time, it is assumed that the source and destination hold some labeled data. Figure 21 shows an example of a framework for performing learning according to a downstream task according to an embodiment of the present disclosure. The shaded portion in FIG. 21 may not be used during learning and inference operations according to downstream tasks.

Referring to FIG. 21, the destination 2120 performs learning for the operation of the downstream task located at the destination 2120 (hereinafter referred to as 'learning for the downstream task'). As an example, the destination 2120 may determine the layers 2140 (hereinafter referred to as “downstream task learning layers”) used to perform learning for a downstream task. The downstream task learning layers 2140 are transform heads (e.g., transform head 1150 in FIG. 11, transform head 2170 in FIG. 21) used during pre-learning (e.g., pre-learning operation in FIG. 11). )) may include the first layer 2160 and additional linear layers suitable for the purpose of downstream tasks.

Meanwhile, depending on the number of paths used when learning for a downstream task, one or two encoding results (2180, 2182) may be transmitted to the destination (2120). If the two

encoding results

2180 and 2182 are delivered to the destination 2120, an additional operation 2130 as follows can be performed.

As an example, when all of the

encoding results

2180 and 2182 performed in two passes are used for learning for a downstream task, the source 2110 uses both passes to produce the two encoding results 2180 , 2182) can be transmitted to the destination 2120. The destination 2120, which has received the two

encoding results

2180 and 2182, may perform an additional operation 2130 to convert the two

encoding results

2180 and 2182 into one result. At this time, one of various functions such as sum, average, and concatenation can be used. The corresponding movement can be learned through a neural network. As another example, when only one pass is used for learning for a downstream task, the source 2110 may select only one pass and transmit one encoding result (2180 or 2182) performed to the destination. At this time, the destination 2120 may not perform the additional operation 2130.

Once the downstream task learning layers are determined, the destination 2120 can learn the representation received from the source 2110 using the downstream task learning layers 2140. At this time, the destination 2120 can use the background knowledge of the destination 2120 updated during the pre-learning process to infer an output that matches the intention delivered by the source 2110.

Meanwhile, the destination 2120 in FIG. 21 can perform learning using a loss function. The destination 2120 can perform learning using the labeled data 2150 it holds and the output output from the downstream task learning layers 2140. As an example, learning may be performed using cross entropy loss. At this time, the cross entropy loss is only an example of a loss function used for learning, and is not limited to this, and other loss functions (e.g., cosine similarity loss, hinge loss) are used for learning. etc.) can be used. Learning using loss functions can be performed depending on the purpose of the downstream task located at the destination.

According to one embodiment, when the destination 2120 performs fine-tuning after pre-learning is completed, the destination 2120 includes the weights of the

encoders

2180 and 2182 located in the source 2110, the destination ( For all networks, including a neural network consisting of downstream task learning layers 2140, by using the weights corresponding to the first layer of the weight and transform head 2170 for the additional operation of 2120) Learning can be done.

According to another embodiment, after pre-learning is completed, when the destination 2120 performs transfer-learning, the destination 2120 determines the weights of the

encoders

2180 and 2182 located in the source 2110 and the destination. The weights corresponding to the first layer of the weight and transform head 2170 for the additional operation of 2120 are fixed, and learning can be performed on the added neural network to suit the purpose of the downstream task.

At this time, fixing the weights of the

encoders

2180 and 2182, the weights for the additional operation of the destination 2120, and the weights corresponding to the first layer of the transform head 2170 may mean that the feature extractor is fixed. there is. If the downstream task learning layers 2140 include only simple linear layers excluding the part where the weight is fixed, the performance of the feature extractor needs to be increased to improve performance through learning, so the feature extractor's performance needs to be increased. You can check performance.

In this way, learning for a downstream task can be performed by learning related networks according to the purpose of the downstream task. Meanwhile, when pre-learning and learning for downstream tasks are completed in a semantic communication system, inference can be performed on the entire network for which all learning has been completed. Here, inference may mean an operation in which the destination 2120 infers the intention conveyed by the source 2110 in task-oriented semantic communication. Therefore, the output output through the downstream task learning layers 2140 of FIG. 21 can be viewed as the result of performing inference. The semantic expression transmitted from the source 2310 for training and inference operations for performing downstream tasks may be updated in the background knowledge of the source 2310 and the destination 2320.

Referring to FIG. 22, in step S2201, the first device receives a request for capability information for the first device from the second device. In step S2203, the first device transmits capability information to the second device. Here, the capability information is used to determine whether the first device can perform semantic communication. As an example, the capability information may include the type of raw data that the first device can collect, generate, or process and computing capability information of the first device.

In step S2205, when it is determined that the first device has semantic communication capabilities based on the capability information of the first device, the first device receives semantic communication-related information from the second device. Semantic communication-related information can be used to generate a semantic communication signal by performing semantic source coding. A semantic communication signal may be a representation containing the meaning that the first device intends to convey to the second device. The semantic communication signal may be used to perform downstream tasks without being decoded by the second device into the raw data used by the first device to create the representation. Semantic communication signals may be used to update shared information (eg, background knowledge) held by the first and second devices.

As an example, the semantic communication signal may include at least one of the expressions used in pre-training for semantic source coding, the expressions used in training to perform downstream tasks, and the expressions used in inference. It can contain one. Pre-learning, learning for downstream tasks, and inference may be performed by the first device and the second device. As an example, semantic communication-related information includes at least one of the units of data to be obtained from raw data, information for performing weekly labeling, mini-batch size, augmentation type and ratio determined based on background knowledge, and information about the encoding model. can do. In the future, information related to semantic communication will include expressions used in pre-training for semantic source coding, expressions used in training to perform downstream tasks, and expressions used in inference. It can be updated based on the updated shared information. As an example, information for performing weekly labeling may include at least one of auxiliary information, whether to use auxiliary information, a type of auxiliary information, a method of performing labeling based on auxiliary information, and a weekly label.

In step S2207, the first device may generate a semantic communication signal based on the semantic communication-related information. In step S2209, the first device may transmit the generated semantic communication signal and information for weekly labeling to the second device. The second device can perform a downstream task without a signal restoration procedure using the semantic communication signal. Additionally, the second device may obtain background knowledge information of the first device based on the semantic communication signal and information for weekly labeling, and update the background knowledge held by the second device.

In FIG. 22, the semantic signal generation procedure is described through the operation between the first device and the second device, but it is only an example for convenience of explanation and may not be limited to the above-described embodiment. That is, it can be used in various embodiments, such as operations between terminals and base stations and operations between terminals (e.g., D2D communication).

Referring to FIG. 23, in step S2301, the device and the base station can perform synchronization. As an example, the device may receive a synchronization signal block (SSB) that includes a master information block (MIB). The device may perform initial connection based on SSB.

In step S2303, the base station may request terminal capability information from the device. In step S2305, the device may transmit terminal capability information to the base station. Terminal capability information is information about whether the terminal has the ability to perform semantic communication. The base station may request terminal capability information from the terminal to check whether semantic communication is performed. Terminal capability information may include information about the types of raw data that the terminal can generate, collect, or process, and the computing capabilities of the device.

In step S2307, the base station may determine whether the terminal can perform semantic communication based on terminal capability information. Hereinafter, steps S2309 and S2311 may be performed when the base station determines that the terminal can perform semantic communication based on terminal capability information.

In step S2309, the base station may transmit semantic communication-related information to the device. In step S2311, the device may store semantic communication-related information. Semantic communication-related information includes the acquisition unit of semantic data, whether auxiliary information is used, and a weekly labeling method according to the type of auxiliary information (e.g., discrete attribute, hierarchical level) (e.g., k in Figure 13 (attribute determined in order of high entropy) number), in Figure 14

(hierarchical level)), mini-batch size, augmentation type and augmentation ratio according to domain knowledge, and information about the encoder model. As an example, semantic communication-related information may be transmitted and included in at least one of a DCI, media access control (MAC), or radio resource control (RRC) message.

Figure 24 shows an example of an information exchange diagram in a mini-batch unit according to an embodiment of the present disclosure. If the mini-batch is set to N, 2N pieces of augmentation data can be generated from the source. The encoder at the source can encode 2N augmentation data to generate 2N representations. Afterwards, the source can transmit the generated 2N representations to the destination. Additionally, if there is a change in information for weekly labeling related to the use of auxiliary information, the destination may transmit the corresponding information to the source.

Referring to FIG. 24, in step S2401, the source may transmit information for a forward-pass to the destination. Information for the forward pass may include an expression vector and weekly labeling information that are the result of encoding the augmentation data.

In step S2403, the destination may transmit information for a backward-pass to the source. Information for the backward pass may include gradient information used for learning. At this time, as described above, information for weekly labeling may be transmitted together. For example, when discrete attributes are used when weekly labeling is performed, k information for setting up a cluster of the top k attributes can be transmitted. As another example, if a hierarchical level is used when weekly labeling is performed, level-

for setting

Information can be transmitted.

Some steps described in FIGS. 23 and 24 may be omitted depending on the situation or settings.

The present disclosure may be embodied in other specific forms without departing from the technical ideas and essential features described in the present disclosure. Accordingly, the above detailed description should not be construed as restrictive in all respects and should be considered illustrative. The scope of this disclosure should be determined by reasonable interpretation of the appended claims, and all changes within the equivalent scope of this disclosure are included in the scope of this disclosure. In addition, claims that do not have an explicit reference relationship in the patent claims can be combined to form an embodiment or included as a new claim through amendment after filing.

Embodiments of the present disclosure can be applied to various wireless access systems. Examples of various wireless access systems include the 3rd Generation Partnership Project (3GPP) or 3GPP2 system.

Embodiments of the present disclosure can be applied not only to the various wireless access systems, but also to all technical fields that apply the various wireless access systems. Furthermore, the proposed method can also be applied to mmWave and THz communication systems using ultra-high frequency bands.

Additionally, embodiments of the present disclosure can be applied to various applications such as free-running vehicles and drones.

Claims

In a method of operating a first device in a wireless communication system,

Receiving a capability information request for a first device from a second device;

transmitting capability information of the first device to the second device;

If the first device is a device equipped with semantic communication capabilities, receiving semantic communication-related information from the second device based on the capability information of the first device;

generating a semantic communication signal based on the semantic communication-related information; and

Including transmitting the semantic communication signal and information for performing weakly labeling to the second device,

The semantic communication signal is generated using shared information and information for performing the weekly labeling,

The weekly labeling is an operation of assigning a weak label to semantic data using the shared information and auxiliary information,

The method wherein the update of the shared information is performed based on an operation of a downstream task performed in the second device.
According to claim 1,

The method of claim 1 , wherein the semantic communication signal is used by the second device to perform a downstream task without being decoded into the raw data used by the first device to create the representation.
According to claim 1,

The capability information is information for determining whether the first device can perform semantic communication, and includes the type of raw data that the first device can process and the computational capability of the first device. Containing information, method.
According to claim 1,

The semantic communication-related information includes semantic data acquisition unit, information for performing the weekly labeling, mini-batch size, augmentation type and augmentation ratio, and configuration information of the encoding model. Include at least one

The semantic data is data extracted from the raw data,

The information for performing the weekly labeling includes auxiliary information,

The method wherein the acquisition unit and the augmentation type and augmentation ratio are determined based on shared information of the first device and the second device.
According to clause 4,

Obtaining semantic data from raw data;

performing weekly labeling on the semantic data to generate weekly labeled semantic data; and

The method further comprising generating augmentation data from the weekly labeled semantic data.
According to claim 1,

The shared information update is performed using a signal converted from the semantic communication signal,

The converted signal is generated based on a data format used to perform a downstream task.
According to claim 1,

The shared information update is performed using a transform head,

The transform head includes at least one dance layer (dense layer) and at least one non-linear function.
According to claim 1,

The shared information update is performed using at least one of an expression used in pre-learning, an expression used in learning to perform a downstream task, and an expression used in inference.
According to clause 8,

The method wherein learning for the downstream task is generated based on the first layer of the transform head and at least one layer determined for performing the downstream task.
According to clause 8,

Learning for the downstream task is,

A method comprising a fine-tuning operation or a transfer-learning operation.
According to claim 10,

The fine tuning operation includes a neural network determined according to the downstream task, after pre-learning is completed, using the weight of the encoder, the weight for the additional operation, and the weight for the first layer of the transform head. This method is performed for all networks.
According to claim 10,

The transfer learning operation is performed after pre-learning is completed, with the weight of the encoder, the weight for the additional operation, and the weight for the first layer of the transform head being fixed, and the MLP (multi-layer perceptron) method.
According to claim 1,

The method wherein the semantic communication signal is transmitted on a layer for semantic communication.
In a method of operating a second device in a wireless communication system,

transmitting a capability information request to a first device;

Receiving capability information from the first device;

If the first device is a device with semantic communication capabilities based on the capability information of the first device, transmitting semantic communication-related information to the first device; and

Receiving a semantic communication signal generated based on the semantic communication-related information and information for performing weakly labeling from the first device,

The semantic communication signal is generated using shared information and information for performing the weekly labeling,

The weekly labeling is an operation of assigning a weak label to semantic data using the shared information and auxiliary information,

The method wherein the update of the shared information is performed based on an operation of a downstream task performed in the second device.
In a first device of a wireless communication system,

transceiver; and

Including a processor connected to the transceiver,

The processor,

receive a capability information request for the first device from the second device;

transmitting capability information of the first device to the second device,

If the first device is a device with semantic communication capabilities based on the capability information of the first device, receive semantic communication-related information from the second device,

Generating a semantic communication signal based on the semantic communication-related information,

Controlling to transmit the semantic communication signal and information for performing weakly labeling to the second device,

The semantic communication signal is generated using shared information and information for performing the weekly labeling,

The weekly labeling is an operation of assigning a weak label to semantic data using the shared information and auxiliary information,

The update of the shared information is performed based on the operation of a downstream task performed in the second device.
In a second device of a wireless communication system,

transceiver; and

Including a processor connected to the transceiver,

The processor,

send a capability information request to the first device;

Receiving capability information from the first device,

If the first device is a device with semantic communication capabilities based on the capability information of the first device, transmit semantic communication-related information to the first device,

Controlling to receive a semantic communication signal generated based on the semantic communication-related information and information for performing weakly labeling from the first device,

The semantic communication signal is generated using shared information and information for performing the weekly labeling,

The weekly labeling is an operation of assigning a weak label to semantic data using the shared information and auxiliary information,

The update of the shared information is performed based on the operation of a downstream task performed in the second device.
A first device comprising at least one memory and at least one processor functionally connected to the at least one memory,

The at least one processor is configured to:

receive a capability information request for the first device from the second device;

transmitting capability information of the first device to the second device,

If the first device is a device with semantic communication capabilities based on the capability information of the first device, receive semantic communication-related information from the second device,

Generating a semantic communication signal based on the semantic communication-related information,

Controlling to transmit the semantic communication signal and information for performing weakly labeling to the second device,

The semantic communication signal is generated using shared information and information for performing the weekly labeling,

The weekly labeling is an operation of assigning a weak label to semantic data using the shared information and auxiliary information,

The update of the shared information is performed based on the operation of a downstream task performed in the second device.
A non-transitory computer-readable medium storing at least one instruction, comprising:

Contains the at least one instruction executable by a processor,

The at least one command is:

receive a capability information request from a second device;

transmit capability information to the second device,

If the computer-readable medium is a medium with semantic communication capability based on the capability information, receive semantic communication-related information from the second device,

Generating a semantic communication signal based on the semantic communication-related information,

Controlling to transmit the semantic communication signal and information for performing weakly labeling to the second device,

The semantic communication signal is generated using shared information and information for performing the weekly labeling,

The weekly labeling is an operation of assigning a weak label to semantic data using the shared information and auxiliary information,

The computer-readable medium wherein the update of the shared information is performed based on an operation of a downstream task performed on the second device.