WO2001043443A1

WO2001043443A1 - Video encoding/transmitting device, video receiving/decoding device, video transmitting/receiving device, and video transmitting system

Info

Publication number: WO2001043443A1
Application number: PCT/JP2000/008603
Authority: WO
Inventors: Morio Yoshimoto; Yukinari Matsuda; Susumu Oka
Original assignee: Mitsubishi Denki Kabushiki Kaisha
Priority date: 1999-12-07
Filing date: 2000-12-05
Publication date: 2001-06-14
Also published as: CN1409928A

Abstract

On the transmitting side, a video signal is object-encoded by an object extraction unit (1) and an object encoding unit (2), a part or all of the object encoded by an object synthesis unit (3) and an object which is object-encoded in advance are combined, and then video data after the combining is transmitted by a circuit interface unit (8).

Description

Description Video encoding and transmitting device, video receiving and decoding device, video transmitting and receiving device, and video transmission system

The present invention relates to a video encoding and transmitting device, a video receiving and decoding device, a video transmitting and receiving device, and a video transmitting system for transmitting video and audio via a predetermined line, and more particularly to a video using an object encoding technology. The present invention relates to an encoding transmission device, a video reception / decoding device, a video transmission / reception device, and a video transmission system. Background art

FIG. 1 is a block diagram showing a conventional video encoding and transmitting apparatus described in, for example, Japanese Patent Application Laid-Open No. 10-42275. In the figure, reference numeral 101 denotes an NTSC (Nationa 1 Television System Comm ittee) video signal from a video camera that uses an image sensor such as a CCD (Charge Coup led Device). ) A camera signal processing unit that performs signal processing such as decoding and A / D conversion.102 is a unit that encodes video signals after A / D conversion as moving image data using the H.261 method. 103 is a moving image data encoding unit that encodes the video signal after A / D conversion as still image data using the JPEG Joint Photographic Experts Group (JPEG) method. 104 is an image data switching unit for switching image data to be transmitted, and 105 is an audio signal processing unit for performing signal processing such as A / D conversion on an audio signal from the microphone. 106 is the value after A / D conversion An audio data encoding unit for encoding an audio signal, 107 is a demultiplexing unit for multiplexing image data and audio data, and 108 is a line interface for transmitting the multiplexed data. Ace department.

Next, the operation will be described.

After the camera signal processor 101 performs signal processing such as NTSC decoding and A / D conversion on the video signal from the video camera that captures images using an image sensor such as a CCD, The data encoding unit 102 encodes the video signal after the AZD conversion as moving image data according to the H.261 method, and the still image data encoding unit 103 performs the AZD conversion. The subsequent video signal is encoded as still image data in JPEG format.

Then, the image data switching unit 104 switches the image data to be transmitted according to the movement of the object in the image, and supplies either the moving image data or the still image data to the demultiplexing unit 107.

On the other hand, the audio signal processing unit 105 performs signal processing such as A / D conversion on the audio signal from the microphone, and the audio data encoding unit 106 encodes the audio signal after AZD conversion. The audio data is supplied to the demultiplexer 107.

The demultiplexing unit 107 multiplexes the image data and the audio data, and the line interface unit 108 transmits the multiplexed data via a line such as an ISDN line.

The one related to the above-mentioned conventional technology is described in Japanese Patent Application Laid-Open No. 7-154567 / 1995.

Since the conventional video coding and transmitting apparatus is configured as described above, it is difficult to reduce the amount of data to be transmitted because unnecessary background is included in the video. There were issues such as the transmission location being specified on the receiving side. The present invention has been made in order to solve the above-described problem. An object of the present invention is to encode a video signal on a transmission side, and a part or all of the encoded object and a previously object-coded object. Video transmission device, video transmission / reception device, and video transmission that can transmit video data by combining the video data and transmitting the video so that the calling side of the caller is not specified on the receiving side. Another object of the present invention is to provide a system in which a video signal is object-coded on a transmitting side, only a part of the coded object is transmitted, and the received object is previously encoded with a received object on a receiving side. The coded object is synthesized and the synthesized video data is decoded so that the caller's calling place can be determined on the receiving side. It is an object of the present invention to obtain a video receiving / decoding device, a video transmitting / receiving device, and a video transmitting system capable of transmitting a video so as not to be specified and reducing the amount of data to be transmitted. Disclosure of the invention

A video encoding and transmitting apparatus according to the present invention includes: a media encoding unit that subjects an externally supplied video signal to object encoding; and a part or all of the object encoded by the media encoding unit. Transmission stream synthesizing means for synthesizing an object encoded in advance with an object, and stream transmitting means for transmitting the video image synthesized by the transmission stream synthesizing means. It is prepared for.

This makes it possible to transmit the video so that the caller's calling location is not specified on the receiving side, and has the effect of reducing the amount of data to be transmitted.

The video encoding and transmitting apparatus according to the present invention is configured so that object encoding is performed in advance. It is equipped with a stream storage means for recording objects.

As a result, only a part of the objects needs to be transmitted, so that the amount of data to be transmitted can be reduced, and furthermore, it is possible to prevent the caller's calling location from being specified on the receiving side. It works.

In the video encoding and transmitting apparatus according to the present invention, the transmission stream synthesizing unit may be configured to convert the video data output from the stream storage unit into a background with respect to the video data encoded by the media encoding unit. In this case, the composition processing is performed.

As a result, only a part of the objects needs to be transmitted, so that the amount of data to be transmitted can be reduced, and furthermore, it is possible to prevent the caller's calling location from being specified on the receiving side. Play.

A video encoding and transmitting apparatus according to the present invention is configured such that the video image is a moving image or a still image.

This has the effect of preventing the caller's originating location from being specified on the receiving side.

A video encoding and transmitting apparatus according to the present invention includes a control unit that controls a transmission stream combining unit according to a transmission destination.

This makes it possible to change the object included in the video image to be transmitted according to the transmission destination, so that the receiving side cannot identify the caller's calling place, and furthermore, the transmission This has the effect of reducing the amount of data required.

A video encoding and transmitting apparatus according to the present invention is configured to combine an audio signal supplied from the outside with a previously acquired audio signal, and then transmit audio data corresponding to the synthesized audio signal together with the video data. Is This has the effect of making it possible to prevent the caller's originating location from being specified on the receiving side.

In the video encoding and transmitting apparatus according to the present invention, the transmission stream synthesizing means may include an externally supplied audio data or an audio data supplied from the stream storage means and an externally supplied video data or stream. This is to combine the video data supplied from the ream storage means.

This has the effect of making it possible to prevent the caller's originating location from being specified on the receiving side.

The video encoding / transmitting apparatus according to the present invention is configured to read out the object encoded in advance from the stream storage means, thereby simplifying the exchange of the object for synthesis. In addition to this, the portability of the object to be synthesized is improved, and for example, an effect of being able to synthesize an object in the background of a place that has not been visited in the past is obtained.

In the video encoding and transmitting apparatus according to the present invention, the stream storage means records one or both of the video data and the audio data which have been previously encoded.

In the video encoding / transmitting apparatus according to the present invention, the control unit selects an object to be output from the stream storage unit that records a plurality of objects encoded based on the communication partner or the communication date and time. This is what we do.

As a result, it is possible to prevent the receiving party from specifying the caller's calling place by the transmission destination, and further reduce the amount of data to be transmitted. It has the effect of being able to do so.

A video encoding and transmitting apparatus according to the present invention is configured to generate video data and audio data by performing coding according to the MPEG-4 system.

As a result, there is an effect that the present invention can be widely used when a device compatible with the MPEG-4 system is widely used.

A video receiving and decoding apparatus according to the present invention includes: a stream receiving unit that receives object-encoded video data; and a part or all of the video data received by the stream receiving unit. A receiving stream synthesizing means for synthesizing an object and an object which has been previously coded, and a media decoding means for decoding video data synthesized by the receiving stream synthesizing means. It is provided.

A video receiving / decoding apparatus according to the present invention includes stream storage means for recording an object encoded in advance.

As a result, only a part of the objects needs to be transmitted, so that the amount of data to be transmitted can be reduced, and furthermore, it is possible to prevent the caller's originating location from being specified on the receiving side. Play.

In the video reception decoding apparatus according to the present invention, the reception stream synthesizing means uses the video data output from the stream storage means as a background with respect to the video data received by the stream reception means. It is a combination process.

As a result, only a part of the objects needs to be transmitted, so that the amount of data to be transmitted can be reduced. This has the effect that it can be prevented from being specified on the receiving side. A video receiving / decoding device according to the present invention is such that video data is a moving image or a still image.

A video receiving / decoding apparatus according to the present invention is configured to combine an object of a person portion received by a stream receiving means with an object of a background portion which has been object-coded in advance. .

As a result, since only a part of the objects need to be transmitted, the amount of data to be transmitted can be reduced, and further, it is possible to prevent the caller's originating location from being specified on the receiving side. Play.

A video receiving and decoding apparatus according to the present invention includes a control unit that controls a receiving stream combining unit according to a transmission source.

With this, it is possible to appropriately select whether or not to execute object combining according to the transmission source, and it is possible to reduce the amount of data to be transmitted.

A video receiving / decoding device according to the present invention is configured to synthesize an audio signal corresponding to audio data received by stream receiving means with an audio signal obtained in advance.

In the video receiving / decoding device according to the present invention, the receiving stream synthesizing means may include an externally supplied audio data or an audio data supplied from the stream storage means, and an externally supplied video data or This is to combine the video data supplied from the stream storage means with the video data.

This ensures that the caller does not know where the caller is calling from Video can be transmitted, and the amount of transmitted data can be reduced.

The video receiving / decoding device according to the present invention is configured to read out an object which has been previously coded from the stream storage means, thereby simplifying the exchange of the object for synthesis. In addition to this, the portability of the object to be synthesized is improved, and for example, it is possible to synthesize an object in the background of a place that has not been visited in the past.

In the video receiving / decoding device according to the present invention, the stream storage means records one or both of the video data and the audio data which have been object-coded in advance.

In the video receiving / decoding device according to the present invention, the control unit selects an object to be output from the stream storage unit that records a plurality of objects that are object-encoded based on the communication partner or communication date and time. It is the one that was adopted.

A video receiving / decoding device according to the present invention is configured to generate video data and audio data by performing encoding according to the MPEG-4 method.

The video transmission / reception device according to the present invention includes a video signal and a video signal supplied from the outside. Encoding means for encoding one or both of the audio signal and the audio signal, and a part or all of the object encoded by the media encoding means and the object encoded in advance. Transmission stream synthesizing means for synthesizing the object and the video data and / or audio data synthesized by the transmission stream synthesizing means. And a stream receiving unit for receiving one or both of the object-coded video data and audio data, and the video data and audio received by the stream receiving unit. An object that combines objects in one or both of the data with pre-encoded objects. Stream receiving means, and a reception processing unit having a media decoding means for decoding one or both of the video data and the audio data combined by the receiving stream combining means. It was made. This allows two-way communication without significantly increasing the circuit scale, and also allows video transmission so that the caller's calling location is not specified on the receiving side. This has the effect of reducing the amount of data required.

A video transmission system according to the present invention is characterized in that a video signal and / or an audio signal supplied from the outside are encoded by a media encoding unit for encoding an object or both of them, and a media encoding unit. Transmission stream synthesizing means for synthesizing a part or all of the coded object and an object coded in advance, and video data and audio data synthesized by the transmission stream synthesizing means. Video coded transmission device having a stream transmission means for transmitting one or both of the video and video data and / or audio data from the video coded transmission device. And a receiving device that receives and decodes both. This allows two-way communication without significantly increasing the circuit scale, and also allows video transmission so that the caller's calling location is not specified on the receiving side. This has the effect of reducing the amount of data that can be generated.

The video transmission system according to the present invention is characterized in that one or both of a video signal and an audio signal supplied from the outside are object-coded, and one of the object-coded video data and audio data is provided. A transmitting device that transmits one or both of the objects, and a stream that receives one or both of the object-coded video data and the audio data from the transmitting device. A receiver that combines an object in one or both of video data and audio data received by the receiving means and the stream receiving means with an object that has been previously encoded. Either video data or audio data synthesized by the stream synthesizing means and the receiving stream synthesizing means Is provided with a video receiving and decoding device having media decoding means for decoding both.

This allows two-way communication without significantly increasing the circuit scale, and also allows video transmission so that the caller's calling location is not specified on the receiving side. This has the effect of reducing the amount of data required. BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a block diagram showing a conventional video encoding and transmitting device. FIG. 2 is a block diagram showing a configuration of a video encoding and transmitting apparatus according to Embodiment 1 of the present invention.

FIG. 3 is a block diagram of a video receiving and decoding apparatus according to Embodiment 2 of the present invention. FIG.

FIG. 4 is a block diagram showing a configuration of a video transmitting / receiving device according to Embodiment 3 of the present invention.

FIG. 5 is a diagram showing an example of a network provided with a video transmission system according to Embodiment 4 of the present invention.

FIG. 6 is a block diagram showing a configuration of a video transmission system according to Embodiment 4 of the present invention.

FIG. 7 is a block diagram showing a configuration of a video transmission system according to Embodiment 5 of the present invention. BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, in order to explain this invention in greater detail, the preferred embodiments of the present invention will be described with reference to the accompanying drawings.

Embodiment 1.

FIG. 2 is a block diagram showing a configuration of a video encoding and transmitting apparatus according to Embodiment 1 of the present invention. In the figure, reference numeral 1 denotes an object extracting unit that processes a video signal from a camera that captures images using an image sensor such as a CCD and divides a video image into objects, and 2 denotes an object extracting unit. An object encoding unit that encodes a video signal based on the data from the clipping unit 1 using a predetermined object encoding method such as MPEG (Moving Picture Ex- perts Group) -4. Media encoding means), 3 is the video data after the object encoding from the object encoding unit 2 is pre-encoded on the recording medium 4 (stream storage means), This is the object synthesis unit (transmission stream synthesis means) that synthesizes with data.

4 is the object coding section 2, the voice coding section (media coding means) 6 and It is a recording medium such as a flash memory and a disk-type recording medium (optical disk, magnetic disk, magneto-optical disk) that stores video data and audio data that have been encoded from outside and object encoded.

Reference numeral 5 denotes a voice adding unit (voice synthesizing means) for adding a voice signal input from a microphone or the like and a voice signal decoded by the voice decoding unit 7, and 6 denotes a voice signal from the voice adding unit 5. Reference numeral 7 denotes an audio decoding unit for encoding in a predetermined format, and 7 denotes an audio decoding unit for decoding encoded audio data stored in the recording medium 4.

Reference numeral 8 denotes a line interface unit (stream transmitting means) for transmitting data from the object synthesizing unit 3 to the receiving side via a predetermined line.

Reference numeral 9 denotes a call control unit (control means) for controlling the object synthesizing unit 3 and the recording medium 4 according to the control information to be transmitted and the receiving side device of the communication partner.

Next, the operation will be described.

When a video signal is supplied, the object extracting unit 1 processes the video signal based on motion and color information, divides the video data into each object, and the object encoding unit 2 divides them into objects. Encoding.

The object-encoded video data is supplied to the object synthesizing unit 3 or the recording medium 4. This video image is supplied to both the object synthesizing unit 3 and the recording medium 4 as necessary.

When the image data is supplied, the object synthesizing unit 3 synthesizes a part or all of the object with an object encoded in advance on the recording medium 4 and the like. The combined data is supplied to the line interface section 8. For example, of the object-coded video data, the object of the person part of the caller and the video data of the background portion which has been previously object-coded are synthesized. At this time, the object synthesizing unit 3 responds to the control signal from the call control unit 9 by a part of the video image after the object coding from the object coding unit 2 (for example, the object of the person part in the video). ) Or supply all or the whole to the line interface section 8 or supply the data after synthesis. For example, only when transmitting / receiving a video to / from a predetermined communication partner, the combined data from the object combining unit 3 is supplied to the line interface unit 8.

Then, the line interface unit 8 transmits the supplied data to the receiving terminal device, which is the communication partner, via a predetermined line.

On the other hand, when the video data is supplied, the recording medium 4 stores the video data. Thereafter, the video data stored in the recording medium 4 is appropriately used as video data (object) to be synthesized by the object synthesizing unit 3 in real time during communication.

When an audio signal from a microphone or the like is supplied, the audio addition unit 5 combines the audio signal with the audio signal obtained by decoding the audio data in the recording medium 4 by the audio decoding unit 7, and Is supplied to the audio encoding unit 6. The audio encoding unit 6 encodes the audio signal, and supplies the encoded audio data to the object synthesizing unit 3 or the recording medium 4. The encoded audio data is supplied to both the object synthesizing unit 3 and the recording medium 4 as necessary.

When supplied with the audio data, the object synthesizing unit 3 synthesizes the image data with the above-described video data (object).

On the other hand, when the audio data is supplied, the recording medium 4 stores the audio data. The audio data stored in the recording medium 4 is then decoded by the audio decoding unit 7 in real time during communication, and the decoded audio signal is used as appropriate as an audio signal synthesized in the audio adding unit 5 Is done. Further, the call control unit 9 controls the object synthesizing unit 3 and the recording medium 4 based on information such as the date and time of communication and the communication partner, and converts the encoded video data and audio data to the object synthesizing unit. Or to 3 In this way, it is possible to execute or not execute the replacement of the background image only for a specific communication partner. In addition, the background image can be switched according to the communication partner, and the combination of background and sound according to the schedule, event, and time can be selected. Furthermore, it is also possible to transmit a pre-stored image without transmitting the video at the place where the video is currently being transmitted, and the answering machine function can be realized.

Also, the call control unit 9 exchanges control information with the communication partner, determines whether the terminal device of the communication partner supports object coding, and determines whether or not to perform transmission using this method. Can be automatically identified.

As described above, according to the first embodiment, the video signal is object-coded on the transmitting side, and a part or all of the coded object is combined with the object coded in advance. Since the synthesized video data is transmitted, the object of the background part, which is encoded in advance, is synthesized in real time with the object of the person part in the video, so that the calling place of the caller can be received. The advantage is that the video can be transmitted without being specified on the side.

Further, according to the first embodiment, by switching the background to be synthesized based on the date and time information, it is possible to obtain an effect of concealing a more natural calling place of the caller.

Further, according to the first embodiment, an audio signal supplied from the outside and an audio signal obtained in advance are synthesized, and the audio data corresponding to the audio signal is transmitted together with the video data. So more caller calling place According to the first embodiment, the voice to be synthesized is switched based on the date and time information, so that a more natural caller's call can be transmitted. The effect of concealing the place is obtained.

Furthermore, according to the first embodiment, since the call control unit 9 automatically identifies that the terminal device of the communication partner supports the object coding, the specific destination For example, instead of transmitting the background of the video image encoded in real time, the object of only the person part is transmitted, and the object of the background part is synthesized by the receiving terminal device. Accordingly, the effect that the amount of data to be transmitted can be reduced is obtained.

Furthermore, since the object encoded in advance is read from the recording medium 4, it is possible to simplify the exchange of the object to be synthesized and to improve the portability of the object to be synthesized. However, for example, it is possible to synthesize an object in the background of a place that has not been visited in the past.

Furthermore, since the video data and the audio data are generated by encoding in the MPEG-4 system, the present invention can be used widely when equipment compatible with the MPEG-4 system becomes widespread. Embodiment 2

FIG. 3 is a block diagram showing a configuration of a video reception / decoding device according to Embodiment 2 of the present invention. In the figure, reference numeral 11 denotes a line interface unit (stream receiving means) for receiving data transmitted via a line, and 12 denotes a data processing unit which converts the received data into an image data object and an audio data. This is the object separation unit that separates objects.

13 is a flash memory that stores the object separation unit 12, the object coding unit 20, the audio coding unit 21 and externally coded video data and coded audio data from outside, It is a recording medium (stream storage means) such as a disk-type recording medium (optical disk, magnetic disk, magneto-optical disk).

Reference numeral 14 denotes an object synthesizing unit which synthesizes a part or all of the video data from the object separating unit 12 and video data encoded in advance and stored in the recording medium 13. And 15 is an object decoding unit (media decoding unit) for decoding the video data from the object combining unit 14.

Reference numeral 16 denotes an audio decoding unit (media decoding means) for decoding audio data from the object separation unit 12, and reference numeral 17 denotes audio for decoding pre-encoded audio data stored in the recording medium 13. A decoding unit (media decoding means) 18 is a speech adding unit (speech synthesis means) for synthesizing and outputting the speech signal from the speech decoding unit 16 and the speech signal from the speech decoding unit 17. .

Reference numeral 19 denotes an object cutout unit that processes a video signal from a camera that captures images using an image sensor such as a CCD and divides the image data into objects, and 20 denotes an object cutout unit. 19 is an object coding section (media coding means) for subjecting the video signal to a predetermined object coding method such as the MPEG-4 method based on the data from 19; This is an audio encoding unit (media encoding means) that encodes the audio signal from the audio signal according to a predetermined method.

Reference numeral 22 denotes a call control unit (control means) for controlling the recording medium 13 and the object synthesizing unit 14 in accordance with the received control information and the transmitting device of the communication partner. Next, the operation will be described.

The line interface 11 receives the data transmitted via the line, and the object separation unit 12 separates the data into video data and audio data. The video data is supplied to the recording medium 13 or the object synthesizing unit 14 or both, and the audio data is supplied to the recording medium 13 or the audio decoding unit 16 or both. . The video data and the audio data supplied to the recording medium 13 are stored. The video data and audio data stored in the recording medium 13 are appropriately used as data for synthesizing video data and audio data received thereafter in real time.

Next, the object synthesizing unit 14 synthesizes a part or all of the video data and the video data stored in the recording medium 13 in accordance with the control signal from the call control unit 22, and synthesizes them. The subsequent video data is supplied to the object decoding unit 15. The object decoding unit 15 decodes the video data from the object synthesizing unit 14 and outputs the decoded video signal.

For example, when the object-encoded video data is composed of an object of a person part and an object of a background part, the object of the person part and other objects stored in the recording medium 13 are stored. The object with the background part is synthesized.

Further, for example, when the object-encoded video data is composed of only the object of the person, the object of the person and the object of the background stored in the recording medium 13 are used. Are synthesized. On the other hand, when supplied with the audio data, the audio decoding unit 16 decodes the audio data and supplies the decoded audio signal to the audio adding unit 18. The audio decoding unit 17 decodes the pre-encoded audio data stored in the recording medium 13 and supplies the decoded audio signal to the audio adding unit 18. And voice The adder 18 combines the audio signal from the audio decoder 16 with the audio signal from the audio decoder 17 and outputs the synthesized audio signal.

In addition, the video data coded by the object clipping unit 19 and the object coding unit 20 is stored in the recording medium 13 so as to be combined with the received video data in real time. The audio data encoded by the audio encoding unit 21 is stored in the recording medium 13 and used as data for synthesizing received audio data in real time. can do.

Further, the call control unit 22 controls the recording medium 13 and the object synthesizing unit 14 based on the communication date and time and information on the communication partner, and converts the pre-encoded video data and audio data into an object. It is supplied to the synthesizing unit 14 and the audio decoding unit 17. This makes it possible to execute or not execute the replacement of the background image only for a specific communication partner. In addition, the background image can be switched according to the communication partner, and the combination of background and sound can be selected according to the scheduled 'event' time.

Also, the call control unit 22 can communicate with the transmitting terminal device of the communication partner, automatically identify whether or not the device performs transmission according to the present method, and execute a receiving process corresponding thereto. . Further, a control signal is appropriately supplied from the transmission side to the reception side, and the video data from the transmission side at the start of communication is stored in the recording medium 13, and thereafter, the video data of the person portion is transmitted from the transmission side. The background portion may be combined with the video data at the start of communication stored in the recording medium 13. At this time, the data may be combined with the video / audio data according to the schedule / event / timetable based on the date and time information. As described above, according to the second embodiment, an object-encoded video image is received from the transmitting side, and a part or a part of the received object is received. Combines the entire object with the object encoded in advance and decodes the combined video data, so that the object of the background part encoded in advance in the object of the person part in the video is By synthesizing the video in real time, it is possible to obtain an effect that the video can be transmitted so that the transmitting side of the caller is not specified on the receiving side.

Further, according to the second embodiment, by switching the background to be synthesized based on the date and time information, it is possible to obtain a more natural effect of concealing the calling place of the caller.

That is, the background portion other than the person of the object-encoded video data received by the object synthesizing section 14 is replaced in real time with the background portion of the video data previously stored in the recording medium 13. As a result, the background is different from the place where the video is currently being transmitted, so the video is currently transmitted even if the transmitting side does not have the function to replace the background as shown in Embodiment 1. It becomes difficult to specify the location on the receiving side.

Further, according to the second embodiment, since the voice signal obtained by decoding the voice data from the transmitting side and the voice signal obtained in advance are synthesized, the calling place of the caller can be further reduced. The advantage is that it can be prevented from being specified by the.

Further, according to the second embodiment, by switching the voice to be synthesized based on the date and time information, it is possible to obtain a more natural effect of concealing the calling place of the caller.

Further, according to the second embodiment, only the object of the person portion, which is a part of the video, is received from the transmitting side, and the background portion of the video object which has been encoded in advance is synthesized in real time. Transmission, so that only some of the objects need to be transmitted, reducing the amount of data to be transmitted. Is obtained. Embodiment 3.

FIG. 4 is a block diagram showing a configuration of a video transmitting / receiving device according to Embodiment 3 of the present invention. In the figure, reference numerals 31 to 38 denote object cutout sections to line interface sections similar to the object cutout section 1 to line interface section 8 in the first embodiment, respectively. 4 to 4 denote the same object separation unit and object synthesis unit as those of the object separation unit 12, the object synthesis unit 14, the object decoding unit 15, and the speech decoding unit 16 in the second embodiment. A call control unit 39 having the functions of the call control unit 9 according to the first embodiment and the call control unit 22 according to the second embodiment.

It should be noted that the object extracting unit 31, the object coding unit (media coding unit) 32, the object combining unit (transmission stream combining unit) 33, and the recording medium (stream storage unit) 34, voice adder (voice synthesizer) 35, voice coder (media encoder) 36, voice decoder (media decoder) 37, line interface (stream) The transmission means 38 and the call control section 39 constitute a transmission processing section, and a line interface section (stream receiving means) 38, an object separation section 41, a recording medium (scan). Stream storage means) 34, object synthesis section (reception stream synthesis means) 42, object decoding section (media decoding means) 43, voice decoding section (media decoding means) 44, voice addition Part 35, voice decoding part (media decoding means) 37 and The call control unit 39 forms a reception processing unit. That is, the recording medium 34, the voice addition unit 35, the voice decoding unit 37, and the line interface unit 38 are also used as a transmission processing unit and a reception processing unit.

Further, the video transmitting / receiving device shown in FIG. This can be realized by adding an object extracting unit 31, an object encoding unit 32, an object synthesizing unit 33, and a speech encoding unit 36 to the unit. That is, the video transmitting / receiving apparatus can be easily realized by making small changes to the video receiving / decoding apparatus.

Next, the operation will be described.

The transmission processing section operates in the same manner as the video encoding / transmission apparatus according to the first embodiment, and the reception processing section operates in the same manner as the video reception / decoding apparatus according to the second embodiment.

As described above, according to the third embodiment, since the transmission processing unit and the reception processing unit described above are provided, two-way communication can be performed, and the effects and effects of the first embodiment can be achieved. The same effect as the effect of mode 2 can be obtained.

Further, according to the third embodiment, a part of the transmission processing unit and a part of the reception processing unit can be shared, and the effects and implementation of the first embodiment can be achieved without greatly increasing the circuit scale. The same effect as the effect of Embodiment 2 can be obtained. Embodiment 4.

FIG. 5 is a diagram showing an example of a network provided with a video transmission system according to Embodiment 4 of the present invention, and FIG. 6 is a diagram showing a configuration of a video transmission system according to Embodiment 4 of the present invention. FIG.

In FIG. 5, reference numerals 61 to 63 denote the same video codes as those of the video coding and transmitting apparatus according to the first embodiment, which are connected to a network 64 by predetermined lines (for example, public telephone lines and mobile telephone lines). This is a terminal device having a generalized transmission device.

In FIG. 6, reference numeral 71 denotes an image from an imaging device 72 such as a CCD camera. A signal and an audio signal from a sound collection device 73 such as a microphone are processed, and image data and audio data are transmitted to another terminal device. The same video encoded transmission device as the video encoded transmission device according to the first embodiment. A device 74 receives video data and audio data from another terminal device through the line interface unit 77, decodes the data by the decoding unit 78, and converts the video signal into a display device such as a display device. And a sound signal to a sound output device 76 such as a speaker.

Next, the operation will be described.

In each of the terminal devices 6 1 and 6 2, the video signal and the audio signal are respectively encoded by the video encoding and transmitting device 71 in the same manner as in the first embodiment, and the encoded data is transmitted to the network 64. Is transmitted to the other terminal devices 62 and 61 via the. The data is received by the receiving device 74 of the other terminal devices 62 and 61, and is decoded into a video signal and an audio signal. As described above, according to the fourth embodiment, since the video encoding and transmitting apparatus according to the first embodiment is used for a video transmission system, the embodiment is applied to a video transmission system for remotely transmitting and receiving video and audio. The effect of 1 can be obtained. Embodiment 5

FIG. 7 is a block diagram showing a configuration of a video transmission system according to Embodiment 5 of the present invention. In the figure, reference numeral 81 denotes a video signal from an imaging device 72 such as a CCD camera and an audio signal from a sound collection device 73 such as a microphone, which are subjected to object coding by a coding unit 82, and are connected to a line interface. A transmission unit for transmitting video data and audio data to another terminal device by the face unit 83, and a processing unit 84 for processing video data and audio data from the other terminal device, and Output audio signal to display device Ί5 or audio output device 7.6 This is a video receiving and decoding device similar to the video receiving and decoding device according to the second embodiment. Next, the operation will be described.

In each of the terminal devices 6 1 and 6 2, the video signal and the audio signal are respectively object-coded by the transmitting device 8 1, and the encoded data is transmitted to the other terminal devices 6 2 and 6 Transmitted to 1. Then, the data is received by the video receiving / decoding device 84 in the other terminal devices 62, 61 in the same manner as in the second embodiment, and is decoded into a video signal or an audio signal. At this time, if only a part of the video data is transmitted from the transmission device 81, the amount of transmission data is reduced.

As described above, according to the fifth embodiment, since the video receiving / decoding device according to the second embodiment is used for a video transmission system, the video transmission system for remotely transmitting and receiving video and audio according to the second embodiment is used. The effect that the effect can be enjoyed is obtained.

Note that, instead of the transmitting device 81 and the video receiving / decoding device 84 in the fifth embodiment, the video transmitting / receiving device according to the third embodiment may be provided. Industrial applicability

As described above, the video encoding / transmitting device, the video receiving / decoding device, the video transmitting / receiving device, and the video transmission system according to the present invention transmit a video so that the caller's calling place is not specified on the receiving side. Suitable for reducing the amount of data to be transmitted.

Claims

The scope of the claims

1. Media encoding means for object encoding a video signal supplied from the outside, and a part or all of the object encoded by the media encoding means, which is previously object-encoded with some or all of the objects Video coded transmission comprising: a transmission stream synthesizing means for synthesizing an object; and a stream transmission means for transmitting video data synthesized by the transmission stream synthesizing means. apparatus.

2. The video encoding and transmitting apparatus according to claim 1, further comprising stream storage means for recording an object which has been previously encoded.

3. The transmission stream synthesizing means performs a synthesizing process on the video data output from the stream storage means with the video data encoded by the media encoding means as a background. 3. The video encoding and transmitting device according to claim 2.

4. The video encoding and transmitting apparatus according to claim 3, wherein the video data is a moving image.

5. The video encoding and transmitting apparatus according to claim 3, wherein the video data is a still image.

6. The video encoding and transmitting apparatus according to claim 1, further comprising control means for controlling a transmission stream synthesizing means according to a transmission destination.

7. The video encoding and transmitting apparatus according to claim 2, further comprising control means for controlling a transmission stream synthesizing means according to a transmission destination.

8. An audio synthesizing means for synthesizing an audio signal supplied from the outside and an audio signal obtained in advance is provided, and the stream transmitting means is synthesized by the audio synthesizing means together with the video data. 2. The video encoding transmission according to claim 1, wherein the audio data corresponding to the audio signal transmitted is transmitted.

9. Speech synthesizing means for synthesizing an audio signal supplied from the outside and an audio signal obtained in advance, and the transmission stream synthesizing means, together with the video data, the audio synthesized by the audio synthesizing means. 2. The video encoding and transmitting apparatus according to claim 1, wherein the video encoding and transmitting apparatus synthesizes audio data corresponding to the signal.

10. The video encoding transmission device according to claim 2, wherein the transmission stream synthesizing unit reads out an object that has been subjected to object encoding in advance from the stream storage unit.

11. The video encoding and transmitting apparatus according to claim 10, wherein the audio data is output from a stream storage unit.

12. The video codec according to claim 2, wherein the stream storage means records one or both of video data and video data which have been encoded in advance. Transmission device.

13. A voice synthesizing means for synthesizing a voice signal supplied from the outside and a voice signal obtained in advance, wherein the transmission stream synthesizing means comprises a voice corresponding to the voice signal synthesized by the voice synthesizing means. 3. The video encoding and transmitting apparatus according to claim 2, wherein the data and the video data output from the stream storage unit are combined.

14. The control means selects, based on a communication partner, an object to be output from a stream storage means which records a plurality of object-coded objects. Item 8. The video encoding transmission device according to Item 7.

15. The control means selects, based on a communication date and time, an object to be output from a stream storage means which records a plurality of object-coded objects. Item 7. The video encoding transmission device according to Item 7.

16. The video encoding and transmitting apparatus according to claim 1, wherein the video data is encoded by an MPEG-4 system.

17. The video encoding and transmitting apparatus according to claim 8, wherein the audio data is encoded according to the MPEG-4 system.

18. Stream receiving means for receiving the object-encoded video data, a part or all of the objects in the video data received by the stream receiving means, and an object in advance. Encoded op A video stream comprising: a receiving stream synthesizing means for synthesizing a video signal; and a media decoding means for decoding video data synthesized by the receiving stream synthesizing means. Receiving decoding device.

19. The video reception / decoding device according to claim 18, further comprising a stream storage unit that records an object that has been previously encoded.

20. The receiving stream synthesizing means performs a synthesizing process on the video data output from the stream storing means with the video data received by the stream receiving means as a background. 10. The video receiving and decoding apparatus according to claim 19, wherein the video receiving and decoding apparatus is characterized in that:

21. The video receiving and decoding apparatus according to claim 20, wherein the video image is a moving image.

22. The video receiving / decoding apparatus according to claim 20, wherein the video data is a still image.

23. The receiving stream combining means combines the object of the person portion received by the stream receiving means with the object of the background portion which has been object-coded in advance. 19. The video receiving and decoding device according to claim 18, wherein:

24. The video receiving and decoding apparatus according to claim 19, further comprising control means for controlling a receiving stream combining means according to a transmission source.

25. The stream receiving means receives the audio data together with the video data, and synthesizes an audio signal corresponding to the audio data received by the stream receiving means with a previously acquired audio signal. 19. The video receiving / decoding device according to claim 18, further comprising a voice synthesizing unit that performs the decoding.

26. A voice synthesizing means for synthesizing an audio signal received from the stream receiving means and a previously acquired audio signal, wherein the receiving stream synthesizing means is provided together with video data by the voice synthesizing means. The audio data corresponding to the synthesized audio signal is synthesized.

18. The video receiving / decoding device according to item 8.

27. The video receiving / decoding apparatus according to claim 19, wherein the receiving stream synthesizing means reads out an object which has been subjected to object encoding in advance from the stream storing means.

28. The video reception / decoding device according to claim 26, wherein the audio data is output from the stream storage means.

29. The video according to claim 19, wherein the stream storage means records one or both of video data and audio data which have been encoded in advance. Receiving decoding device.

30. Speech synthesizing means for synthesizing a speech signal received from the stream receiving means and a previously acquired speech signal, wherein the receiving stream synthesizing means comprises a speech synthesized by the speech synthesizing means. Audio data corresponding to the signal 10. The video reception decoding apparatus according to claim 19, wherein the video data output from the stream storage means is combined with the video data in the evening.

31. Speech synthesizing means for synthesizing an audio signal received from the stream receiving means and a previously acquired audio signal, wherein the receiving stream synthesizing means is synthesized by the speech synthesizing means. The audio data corresponding to the audio signal and the video data output from the stream storage means are synthesized, and the synthesized audio data and the video data are stored in the stream storage means. 10. The video receiving / decoding device according to claim 19, wherein:

32. The control means selects, based on a communication partner, an object to be output from the stream storage means which records a plurality of object-coded objects. 25. The video receiving / decoding device according to item 24.

33. The control means selects, based on a communication date and time, an object to be output from a stream storage means which records a plurality of object-coded objects. Item 26. The video reception / decoding device according to Item 24.

34. The video receiving / decoding apparatus according to claim 18, wherein the video data is coded according to the MPEG-4 method.

35. The video receiving / decoding apparatus according to claim 25, wherein the audio data is encoded by the MPEG-4 system.

36. Media encoding means for object encoding one or both of a video signal and an audio signal supplied from the outside, and one of the objects encoded by the media encoding means. Transmission stream synthesizing means for synthesizing a part or all of the object and an object coded in advance, and video data and audio data synthesized by the transmission stream synthesizing means. A transmission processing unit having a stream transmission means for transmitting one or both of them, and a stream reception unit for receiving one or both of the object-encoded video data and audio data. Communication means, the object in one or both of the video data and / or the audio data received by the stream receiving means. Receiving stream synthesizing means for synthesizing the object and the image data and / or audio data synthesized by the receiving stream synthesizing means. And a reception processing section having a media decoding means for decoding the video.

37. Media encoding means for object encoding one or both of a video signal and an audio signal supplied from the outside, and one of the objects encoded by the media encoding means. Transmission stream synthesizing means for synthesizing a part or all of the object and an object coded in advance, and either one of video data and audio data synthesized by the transmission stream synthesizing means Or a video encoding and transmitting apparatus having a stream transmitting means for transmitting both, and receiving and decoding either or both of the video data and the audio data from the video encoding and transmitting apparatus A video transmission system comprising a receiving device.

3 8. Either or both of the externally supplied video signal and audio signal are object-encoded, and the object-encoded video data — one or both of the evening and audio data. A transmitting device for transmitting a part of the data, a stream receiving means for receiving one or both of the object-encoded video data and the audio data from the transmitting device, and the stream. A receiving stream synthesizing means for synthesizing an object in one or both of the video data and / or the audio data received by the stream receiving means and a pre-encoded object. , And a media that decodes one or both of the video data and the audio data synthesized by the reception stream synthesis means. Video transmission system, comprising a video receiver-decoder apparatus having decoding means.