WO2021155812A1 - Dispositif de réception, serveur, et système de traitement d'informations de parole - Google Patents

Dispositif de réception, serveur, et système de traitement d'informations de parole Download PDF

Info

Publication number
WO2021155812A1
WO2021155812A1 PCT/CN2021/075126 CN2021075126W WO2021155812A1 WO 2021155812 A1 WO2021155812 A1 WO 2021155812A1 CN 2021075126 W CN2021075126 W CN 2021075126W WO 2021155812 A1 WO2021155812 A1 WO 2021155812A1
Authority
WO
WIPO (PCT)
Prior art keywords
scene
information
voice
receiving device
server
Prior art date
Application number
PCT/CN2021/075126
Other languages
English (en)
Chinese (zh)
Inventor
山本澄彦
村上雅俊
岡野和幸
加藤雅也
堤竹秀行
辻雅史
西口友美
内野聡
Original Assignee
海信视像科技股份有限公司
东芝视频解决方案株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2020019997A external-priority patent/JP7181907B2/ja
Priority claimed from JP2020019686A external-priority patent/JP7272976B2/ja
Priority claimed from JP2020155675A external-priority patent/JP7463242B2/ja
Application filed by 海信视像科技股份有限公司, 东芝视频解决方案株式会社 filed Critical 海信视像科技股份有限公司
Priority to CN202180001659.7A priority Critical patent/CN113498538A/zh
Publication of WO2021155812A1 publication Critical patent/WO2021155812A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • This embodiment relates to a receiving device, a server, and a voice information processing system.
  • voice-based instructions can be operated and controlled from smart speakers that use voice recognition technology.
  • the smart speaker needs to use a trigger word to activate the smart speaker before using the voice command.
  • CMs of goods and services are broadcast in programs such as television broadcasts.
  • the program introduces products and other items.
  • users as viewers can record programs on television receivers, video recorders, etc., from the recorded programs, find out CM scenes for specific products, or find out columns that introduce specific products, etc. It’s very troublesome to have to play the recorded program.
  • a system has been proposed to provide information such as products introduced through programs to portable electronic device terminals, for users, it is necessary to operate the portable electronic device terminal and display information such as the product on the display of the portable electronic device terminal. The screen is more complicated.
  • notification services for recommending information such as products and services to terminal devices such as smartphones and tablet PCs that have established contact with the receiver are being provided based on the status of receivers such as television receivers and video recorders. use.
  • the notification service of recommended information the status of the receiver is sent from the receiver to the management server via the Internet, and the recommended information corresponding to the status of the receiver is sent from the management server to the terminal device via the Internet. Therefore, there is a time lag until the terminal device receives the recommendation information corresponding to the state of the receiver.
  • the user frequently changes the state of the receiver due to operations such as changing the channel of the TV currently being watched or stopping the playback of the video program being broadcast and watching the TV program in real time. Therefore, when the terminal device receives the recommendation information corresponding to the status of the receiver, the status of the receiver may have been changed by the user. That is, in the conventional notification service of recommended information, when the status of the receiver is frequently changed, there is a recommendation information such as a product that is not related to the current status of the receiver at all, and it is displayed on the terminal. Device concerns.
  • Patent Document 1 JP Patent Publication No. 2019-207286
  • Patent Document 2 JP Patent Publication No. 2020-122819
  • Patent Document 3 JP 2012-92443 A
  • Patent Document 4 Japanese Patent No. 5668013
  • the voice command issued by the user may not be a voice command for the image scene at the moment when the viewer is interested.
  • an object of the present application is to provide a receiving device, a server, and a voice information processing system that perform processing voice instructions on an image scene specified by a user.
  • Another object of the present application is to provide a recorded scene display device in which a user who is a viewer of a program can easily watch a scene introducing a desired product or the like from a plurality of recorded programs.
  • another object of the present application is to provide an information communication system, a receiving device, a terminal device, a display control method, and a display control program capable of displaying appropriate recommendation information consistent with the state of the receiving device, wherein the display control program may be A form of a set of computer instructions that can be run on a computer device to make the computer device perform a predetermined method or function.
  • the receiving device of the present application includes: a control signal receiving unit that receives a scene designation signal that is a control signal for specifying a scene of an image as the image content when the image content is output from the display unit; and a control unit that receives Voice, performing voice recognition on the voice, and generating a start command for the voice command obtaining mechanism for obtaining the command to start the command obtaining.
  • the recording scene display device of the present application has: a display information generating unit that generates display information of scene information associated with the recorded program, the scene information being related to a scene including at least one of goods and services; and scene playback
  • the processing unit generates the playback information of the scene selected from the scene information according to the playback instruction.
  • the information communication system of the present application has a receiving device, a management server, and a terminal device.
  • the receiving device generates first state information indicating its current state.
  • the management server generates recommendation information associated with the first state information based on the first state information.
  • the terminal device acquires the first state information and the recommended information associated with the first state information, and acquires a time period corresponding to the acquisition of at least one of the first state information and the recommended information.
  • the second state information of the state of the receiver, the first state information is compared with the second state information, and the first state information is at least partially consistent with the second state information In this case, the display process of the recommendation information is executed.
  • FIG. 1 is a diagram showing a configuration example of a system related to an embodiment
  • FIG. 2 is a diagram schematically showing the structure of a receiving device
  • Fig. 3 is a block diagram showing a configuration example of a smart device
  • Fig. 4 is a block diagram showing an example of the structure of a server
  • FIG. 5 is a sequence diagram showing an example of the operation of the system according to the first embodiment
  • FIG. 6 is a flowchart showing an example of the operation of the system according to the embodiment.
  • FIG. 7 is a diagram showing an example of the operation of the system according to the embodiment.
  • FIG. 8 is a sequence diagram of the system according to the second embodiment.
  • FIG. 9 is a diagram showing an example of a data flow in the system according to the embodiment.
  • FIG. 10 is a diagram showing a first data flow example related to the system of the modification example.
  • FIG. 11 is a diagram showing a second data flow example related to the system of the modification example.
  • FIG. 12 is a diagram showing a third data flow example of the system according to the modification example.
  • FIG. 13 is a block diagram showing the structure of a server according to another embodiment
  • Fig. 14 is a structural diagram of program information according to another embodiment.
  • 15 is a diagram showing multiple time periods of multiple scenes introducing multiple commodities (or services) included in one program content according to another embodiment
  • FIG. 16 is a structural diagram of a video content display system according to another embodiment.
  • FIG. 17 is a flowchart for explaining the operation of the terminal device according to another embodiment.
  • FIG. 1 is a diagram showing a configuration example of the system according to the embodiment.
  • the receiving device 100 is, for example, a receiving device for digital television broadcasting (also referred to as a television receiving device or a television receiving device), not shown, and receives broadcast signals of 4K/8K broadcasting such as high-bandwidth satellite digital broadcasting from an antenna, cable broadcasting, etc. , Existing 2K broadcasting signals such as terrestrial digital broadcasting, BS digital broadcasting, and CS digital broadcasting. It may also refer to broadcast signals of various digital broadcasts such as 4K/8K broadcasts and 2K broadcasts, and may be referred to as various broadcast signals.
  • the receiving device 100 obtains content-related data (referred to as content data) such as image signals, voice signals, and text signals from broadcast signals, and provides content to users.
  • the receiving device 100 may not acquire a broadcast signal, but acquires image data for digital television broadcasting, etc., from a storage medium such as a DVD, a hard disk, or a content server (not shown) on the Internet, or the like.
  • the remote controller 200 is a remote controller attached to the receiving device 100, and performs power on and off, channel switching, etc., to control the receiving device 100 remotely.
  • a control signal based on infrared rays or the like (referred to as a remote control control signal) is output from the remote control 200 to the receiving device 100.
  • a scene specifying button 201 is provided in the remote controller 200 in this embodiment.
  • the remote control control signal corresponding to the scene designation button 201 (referred to as a scene designation signal) is output. If the receiving device 100 receives the scene designation signal, it determines the scene (image frame) of the content (image, voice, text, etc.) output from the display 170, the speaker 171, etc. at the timing of the reception, and acquires the viewing content information related to the scene , Scene designated time data.
  • the so-called scene is basically a momentary image, representing one frame of the image.
  • the scene for the user may not be one frame of the image, but an image with a time width of a few seconds.
  • the so-called viewing content information is information used to determine what the content is, such as the channel on which the content being output is broadcast.
  • the scene designation time data is time information such as the broadcast time of the designated scene. Including viewing content information and scene designation time data is called scene determination information.
  • the receiving device 100 may also store the acquired scene identification information in a memory or the like.
  • the scene specifying button 201 may substitute for an existing button such as a "OK" button.
  • by updating the firmware of the remote controller 200, etc. it may be assigned to an existing button.
  • it does not need to be the remote controller 200 attached to the receiving device 100, and may be a button device dedicated to the scene specifying button 201, or the like.
  • a dedicated button device may be connected to the remote controller 200.
  • the receiving device 100 may store data of instantaneous images (image frames) of a certain scene in a memory or the like.
  • the smart device 300 may be able to receive the scene designation signal output by the remote controller 200.
  • the smart device 300 is a smart speaker, which has a built-in speaker, a microphone, a camera, a voice recognition mechanism, etc., and receives voice from the microphone.
  • the voice recognition mechanism can extract instructions superimposed on the voice from the received voice.
  • the smart device 300 has an interface with an external device, and can interact with the external device for data.
  • the smart device 300 includes interfaces to connect to the receiving device 100, the remote controller 200, and the network 500.
  • the smart device 300 receives the "question" through voice, it can obtain the "answer” for the "question” from an artificial intelligence engine (AI engine) or the like on the network 500.
  • AI engine artificial intelligence engine
  • the smart device 300 may also have an AI engine.
  • the server 400 is a server that provides information related to watching content (also referred to as information related to watching content), and may be, for example, a cloud server.
  • the server 400 exchanges data with the receiving device 100 and the smart device 300 via the network 500. If the server 400 in this embodiment receives the scene determination information and instructions from the receiving device 100 and the smart device 300, it will perform processing based on the instructions on the scene determined according to the scene determination information.
  • the server 400 outputs the processing result to the receiving device 100 and the smart device 300. For example, the server 400 generates an “answer” to the “question” received from the smart device 300 and outputs it to the smart device 300.
  • the network 500 is an electrical communication line, such as the Internet.
  • FIG. 2 is a block diagram schematically showing the structure of the receiving device 100.
  • the receiving device 100 includes a basic function 160 as a function of receiving broadcast waves, a system control unit 161, a communication control unit 162, and an application control unit 163. In addition, the receiving device 100 is connected to a display 170 and a speaker 171.
  • the basic function 160 includes a broadcast tuner 101, a demultiplexer 102, a descrambler 103, an image decoder 104, a speech decoder 105, a subtitle decoder 106, a buffer data unit 107, and a transmission control signal analysis unit 111.
  • the broadcast tuner 101 demodulates a stream (broadcast signal) sent through broadcast waves.
  • the demodulated stream (broadcast signal) is input to the demultiplexer 102.
  • the demultiplexer 102 separates the input multiplexed stream into image stream, voice stream, subtitle stream, application data, transmission control signal, and the image stream, voice stream, subtitle stream, and application data are input to the descrambler 103.
  • the transmission control signal is input to the transmission control signal analysis unit 111.
  • the descrambler 103 descrambles each stream as needed, inputs the image stream to the image decoder 104, the voice stream to the speech decoder 105, the subtitle stream to the subtitle decoder 106, and the application data to the buffer ⁇ 107 ⁇ Data section 107.
  • the image stream is decoded by the image decoder 104, the voice stream is decoded by the speech decoder 105, and the subtitle stream is decoded by the subtitle decoder 106.
  • the transmission control signal analysis unit 111 analyzes various control information included in transmission control signals, SI information (Signaling Information), and the like.
  • the transmission control signal analysis unit 111 also sends control information related to application data, namely MH-AIT, data transmission messages, etc., among the analyzed transmission control signals to the application control unit 163 for further analysis.
  • the transmission control signal analysis unit 111 extracts viewing content information related to the content in the broadcast and the like from various control information such as the transmission control signal and the SI signal, and stores it in a memory (not shown) or the like.
  • the application control unit 163 manages and controls control information such as MH-AIT and data transmission messages, which are control information related to application data sent from the transmission control signal analysis unit 111.
  • the application control unit 163 uses the cached data stored in the cache data unit 107 to control the browser 164 to perform screen display control of the data broadcast.
  • the browser 164 generates the screen overlap data of the subtitle based on the output data of the subtitle decoder 106.
  • the decoded image signal and display content (content) such as subtitles and data broadcasting are synthesized by the synthesizer 165 and output to the display 170.
  • the voice data decoded by the voice decoder 105 is output to the speaker 171.
  • codec type of the image decoder 104 is H.265, but it is not limited to this, and it may be any of MPEG-2 or H.264. In addition, the type of codec is not limited to this.
  • the system control unit 161 performs control of various functions of the receiving device 100 based on control signals from an external device or the like received through the communication control unit 162. For example, when the system control unit 161 receives a scene designation signal from the remote control I/F162-2 of the communication control unit 162, it generates a voice detection function for the smart device 300 or a voice recognition-based command acquisition function (also It may be referred to as a voice command acquisition function) which is set to a control signal that is activated (turned on), and the control signal is sent to the smart device 300. In addition, when the system control unit 161 receives the scene designation signal, it determines the scene of the content being output from the display 170, the speaker 171, etc.
  • the system control unit 161 may determine the scene designation time data by, for example, a clock not shown in the receiving device 100, or may determine the scene designation time data based on time information included in the broadcast signal.
  • the communication control unit 162 includes various interfaces.
  • the network I/F162-1 is an interface for the network 500.
  • the communication control unit 162 can be connected to the server 400 via the network I/F 162-1 and the network 500.
  • the communication control unit 162 can acquire applications and contents managed by a service provider device (not shown) via the network.
  • the obtained application program and content are sent from the communication control unit 162 to the browser 164 and used for display and the like.
  • the remote control I/F162-2 is an interface with the remote control 200, and may have an infrared communication function, for example.
  • the remote control I/F162-2 receives remote control control signals output by the remote control 200.
  • the smart device I/F 162-3 is an interface with the smart device 300, for example, a wired cable can be connected, and it can be a wireless communication interface such as Wifi (registered trademark) and Blootooth (registered trademark).
  • Wifi registered trademark
  • Blootooth registered trademark
  • the receiving device 100 can perform direct data communication with the smart device 300.
  • the receiving device 100 can also perform data communication with the smart device 300 via the network I/F 162-1.
  • FIG. 3 is a block diagram showing a configuration example of the smart device 300.
  • the smart device 300 is equipped with a voice recognition unit 310, a system controller 301, a ROM 302 that saves programs, etc., a RAM 303 used as a temporary memory, a motor control unit 304, a motor 321 controlled by the motor control unit 304, and is driven by the motor 321 and is intelligent The driving mechanism 322 for changing the orientation of the device 300 and the like. Furthermore, the smart device 300 is equipped with a clock 305, a camera 311, a microphone 312, a speaker 313, an interface unit 314, and a battery 333.
  • the smart device 300 can input the voice received from the microphone 312 to the voice recognition unit 310, and extract a command or the like superimposed on the voice.
  • the fetched command can be output from the interface unit 314 to an external device, for example.
  • the smart device 300 in this embodiment receives a control signal for activating the voice command receiving function or the command obtaining function based on voice recognition, it activates its own voice command obtaining function.
  • the normal smart device 300 needs to receive a voice command called a trigger word before activating the voice command acquisition function.
  • the scene specification signal output by the remote controller 200 is used to specify the scene and then start Reception of voice commands.
  • the system controller 301 uses the "voice signal” picked up when the voice detection function is turned on, the "voice detection time data" at the time of pickup, and the "smart speaker identification information" as "voice instruction information (may be simply referred to as instructions)". Temporarily stored in RAM303. In addition, the system controller 301 controls to transmit “voice command information” to the server 400 via the interface unit 314.
  • Fig. 4 is a block diagram showing a configuration example of a server.
  • the server 400 includes an interface unit 411, a system controller 422, a storage unit 423, and an analysis unit 424.
  • the scene designation data and instructions sent from a television receiver or a smart speaker are temporarily taken into (buffered) the storage unit 423 under the control of the system controller 422.
  • the analysis unit 424 analyzes the received data taken into the storage unit 423.
  • the analysis unit 424 determines the scene of the broadcast program based on the received scene designation data, and executes instructions on the determined scene.
  • the analysis unit 424 obtains from a database or the like the place where the scene (image) designated by the scene designation data is displayed as content-related information, for example, "This is Yatsugatake in Nagano Prefecture.” This content is output to the receiving device 100 and the smart device 300. The content-related information is provided to the user from the receiving device 100 and the smart device 300.
  • the processor obtains program information from the server through the network I/F.
  • the program information contains product/service information.
  • the processor compares the program recorded in the storage device with the product/service information included in the program information PI, generates recommended product information, and displays it on the display device through the display I/F.
  • FIG. 13 is a block diagram showing the structure of the server 203.
  • the server 203 includes a processor 2014, a storage device 2015, and a network I/F 2031.
  • the processor 2014 includes CPU, ROM, RAM, and so on.
  • the ROM stores software programs for various functions, and the CPU reads the necessary programs from the ROM, expands them to the RAM, and executes them, thereby realizing various functions of the server 203.
  • the processor 2014 may communicate with the receiving device through the network I/F 2031.
  • the storage device 2015 has a program information storage area 2015a in which program information PI is stored.
  • the program information PI includes the goods or services of the CM broadcast in the broadcast program and the goods/service information about the goods or services introduced in the program. Therefore, the server 203 has program information PI containing scene information.
  • the processor 2014 constitutes a program information management unit that associates scene information about a scene including at least one of a commodity and a service with the program and manages it as program information.
  • Fig. 14 is a structural diagram of program information PI.
  • the program information PI includes information about the title, date, channel (CH), product (or service), and time period of the program.
  • the product/service information is information about the product (or service) and the time period. That is, the program information PI includes product/service information, and the product/service information includes information about the product (or service) and information about the time period in which the product (or service) is introduced.
  • the service information in the commodity/service information is a store related to gourmet information, the name of the service provided, and the like.
  • the title is the name of the broadcast program and is used to distinguish information from other programs.
  • the date is information indicating the broadcast date of the program, and the year, month, and day the program was broadcast. It should be noted that the date can also include the broadcast start time and broadcast end time of the program.
  • the channel is information indicating the channel on which the program is broadcast.
  • Commodity (or service) and time period are a pair of information.
  • the program information PI contains information about multiple commodities (or services) and the scene that introduces the corresponding commodities (or services).
  • Commodity (or service) 1 is information indicating a commodity (or service) related to a scene (for example, CM1) broadcast in the program.
  • the time zone 1 is information indicating the broadcast time zone of the scene of the commodity (or service) 1 (for example, CM1).
  • Commodity (or service) 2 is information indicating commodities (or services) related to other scenes (for example, CM2) broadcast in the program.
  • the time zone 2 is information indicating the broadcast time zone of the scene of the commodity (or service) 2 (for example, CM2).
  • FIG. 15 is a diagram showing a plurality of time periods in which a plurality of scenes of a plurality of commodities (or services) included in one program content are introduced.
  • the program content C1 includes multiple scenes introducing multiple products (or services). It shows the following content: the program content C1 is broadcast from the start time t0, and the broadcast of the CM1 of the product (or service) (hereinafter sometimes referred to as a product) X1 starts at the time t1 when only the time T1 has passed from the start time t0, and ends at the time t2. Similarly, the following content is expressed: the broadcast of CM2 of product X2 starts at time t3 when only time T2 has elapsed from start time t0, and ends at time t4.
  • each product (or service) and each time zone in each program information PI is information indicating the related products (or services) such as CM in each program content shown in FIG. 15 and the broadcast time zone of the scene. Therefore, the server 203 has, for each program broadcast in the past, the product (or service) introduced in the CM or the like broadcast in the program and the information of the broadcast time zone. Therefore, the program information PI contains the start time and end time of the scene of at least one of the recorded programs.
  • the program information PI includes link address information of the information of each scene, and the information of each scene is stored in a storage area indicated by the link address information.
  • the program information PI may also have a structure including address information and the like storing such information related to each scene.
  • the storage device 2015 has a product display information storage area 2015b that stores image and text display information for each product (or service) displayed in the recommended product display image.
  • the image data stored in the product display information storage area 2015b is data representing representative images and videos of each product (or service).
  • the text data stored in the product display information storage area 2015b is the name (ie, product name) of each product (or service). It should be noted that the text data may also include text describing the characteristics, content, etc. of each commodity (or service).
  • the processor performs the commodity (or service) scene (CM) from the program content including the scene introducing the commodity (or service) related to the selected window Scenes or scenes in the program) play. Since the program content including the scene (the CM scene or the scene in the program) that introduces the product (or service) related to the selected window is stored in the program content storage area, the processor reads the content from the program content storage area. The content of the program is played based on the title-based selectable method (that is, the method of finding the starting point when you want to play a certain part of the video) of the scene of the commodity (or service) related to the selected window. From the time period information of the product (or service) contained in the program information PI, the method can be selected through the title of the scene, and the content (for example, CM) of the product (or service) is played from the beginning of the scene.
  • the recommended product display image has a title part and a product display part.
  • the title part is a display area where the name of the screen is displayed, and here is a display area where the text of "recommended product list" is displayed.
  • the merchandise display part is a display area of multiple merchandise (or services) arranged according to a priority determined by multiple predetermined rules.
  • the product display unit multiple windows of each product (or service) are arranged in a tile shape according to priority. According to the determined priority, the windows of multiple commodities are displayed in the recommended commodity display image in the order of high priority from top to bottom.
  • Each window has an image display part and a text display part.
  • the image display section is a display area where the image in the display information acquired during display is displayed.
  • each window becomes a GUI (Graphical User Interface) button. Therefore, the user can select a window by, for example, moving the cursor or the like on a desired window.
  • GUI Graphic User Interface
  • image data of the product (or service) displayed on the image display unit is acquired from the server, it may also be extracted from the recorded program content.
  • the text display section is a display area for the text of the acquired display information.
  • the image and text of each product (or service) are displayed in the product recommended product display image.
  • the recommended product display image displays a plurality of products (or services) in the order of products (or services) that attract the user's interest according to the user's taste or the like.
  • the images related to each product (or service) may also be images related to the program.
  • the image about each product (or service) is a candidate image representing a candidate for a selectable scene. Therefore, the scene information display section selectively arranges and displays scene information about scenes including at least one of goods and services, which are related to the recorded program, together with the candidate images.
  • the content playback process is executed, and the playback image presented by the playback process is displayed on the display device.
  • the processor plays the scene (the CM scene or the scene in the program) of the commodity (or service) from the program content including the scene introducing the commodity (or service) related to the selected window. Since the program content including the scene (the CM scene or the scene in the program) that introduces the product (or service) related to the selected window is stored in the program content storage area, in this process, the processor reads from the program content The storage area reads the content of the program, and plays the scene of the goods (or services) related to the selected window based on the credit-based selectable method (that is, the method of finding the starting point when you want to play a certain part of the video). Therefore, in accordance with the playback instruction, the playback information of the scene selected from the scene information is generated, and a scene playback processing unit that performs playback is constituted.
  • the method can be selected through the title of the scene, and the content (for example, CM) of the product (or service) is played from the beginning of the scene.
  • the content (for example, CM1) of the product (or service) is played on the screen on the display device. That is, a scene such as a CM of the selected product (or service) is played from the beginning of the time period. Therefore, according to the playback instruction for the selected scene information, the playback information of the scene selected from the scene information is generated.
  • the scene playback can also be performed from the beginning of the program content (ie, the broadcast start time of the program) including the scene (for example, CM) of the commodity (or service) related to the selected window.
  • the scene for example, CM
  • the recommended product display image is an image in which a rectangular window of each product (or service) is arranged in tiles on a two-dimensional level, but it can also be The images and texts of goods (or services) are displayed in a list format and arranged in the order of high priority.
  • the collation process is performed in the receiver, but the program content recorded in the receiver may be sent to the server and executed in the server.
  • the recommended product display image is displayed on the display device of the receiving device, but the recommended product display image may be displayed on a display of a terminal device different from the receiving device.
  • FIG. 16 is a structural diagram of the recorded content display system 201A.
  • the recording content display system 201A includes a receiving device 202, a server 203, and a smartphone 206 as a portable terminal device. Similar to the receiving device 202 and the server 203, the smart phone 206 can communicate with the receiving device 202 and the server 203 via a network 204 such as the Internet.
  • the recommended product display image is displayed on the display 206a of the smartphone 206. Therefore, the recommended product display program is stored in the ROM of the processor 206b (indicated by the dotted line) of the smartphone 206.
  • a touch panel device (not shown) is mounted on the display 206a.
  • the processor 206b of the smartphone also includes a CPU, ROM, and RAM, and implements various functions of the smartphone by executing various programs stored in the ROM.
  • the smart phone 206 executes a process flow of the recommended product display program.
  • the processing of the recommended products is almost the same as the display processing of the recommended products on the TV, but can be executed by the processor 206b of the smart phone 206.
  • the processor 206b executes various processes to display the recommended product display image on the screen of the display 206a.
  • the processor 206b executes various processes to display the recommended product display image on the screen of the display 206a.
  • the processor 206b executes various processes to display the recommended product display image on the screen of the display 206a.
  • the smart phone communicates with the receiver through the network, and obtains the management information MI of the recording management information storage area from the receiver.
  • the processor 206b determines whether the user has selected a window in the recommended product display image displayed on the display by touching the screen or the like. If no window is selected, nothing is done.
  • the processor 206b sends a play instruction signal to the receiver via the network, where the play instruction signal is used to instruct the selected window to be related
  • the scene of goods (or services) is played.
  • the instruction sending unit sends an instruction of a playback instruction of the scene selected from the scene information to the receiver as a device for storing the recorded program.
  • the broadcast instruction signal also contains content determination information (included in the management information MI) for determining the content of the program, such as the program name and the broadcast date, and the time period information of the scene (CM, etc.) of the goods (or services) that should be played, Enables the determination of the content (CM, etc.) that should be played.
  • content determination information included in the management information MI
  • the play instruction signal may also be sent from the smart phone 206 to the receiver via a network other than the network (for example, a home LAN), or directly through a short-range wireless signal or the like.
  • the processor of the receiver When the processor of the receiver receives the play instruction signal, it plays the scene (CM, etc.) that should be played from the program content recorded in the storage device according to the play instruction signal. As a result, the user can watch the CM or the like of the selected product (or service).
  • the smart phone 206 is a scene display control device. Therefore, using a smart phone as a scene display control device can obtain similar effects as using a smart TV as a scene display control device.
  • the user only selects the product (or service) related to the CM broadcast in the program, the product (such as a book) or service introduced in the column of the program from the recommended product display screen.
  • the product such as a book
  • service introduced in the column of the program from the recommended product display screen.
  • restaurant introduction it is possible to simply watch multiple desired commodity (or service) scenes (ie, CM scenes or scenes of columns in the program) from the recorded content of multiple programs.
  • each of the above-mentioned processors has a CPU and a ROM, and the CPU reads and executes software programs or computer instructions stored in the ROM to realize various functions of each device, but each processor may be constituted by an electronic circuit, Or it can also be formed as a circuit block of an integrated circuit such as FPGA (Field Programmable Gate Array).
  • FPGA Field Programmable Gate Array
  • the program content is recorded in the storage device of the receiver, it may be recorded in the server.
  • the storage device of the server has a program content storage area, and the server broadcasts the program content of the designated program according to the broadcast instruction of the program from the receiver, and sends the video signal to the receiver.
  • the video of the received video signal is displayed on the display device of the receiver.
  • a playback instruction instruction is sent to the server.
  • the server since the management information MI is also stored in the storage device, part of the processing is executed by the processor of the server.
  • the display data of the recommended product display screen is sent to the receiver, the recommended product display screen is displayed on the display device, and the user can select the product or the like.
  • the control unit receives recommendation information, other recommendation information, and status information of the receiving device from the merchandise sales management server (S3036).
  • the control unit receives the recommended information, the status information of the receiving device, and other recommended information from the merchandise sales management server, in the processing of S3032, it inquires the current status of the receiving device, and in the processing of S3033, it receives the current status from the receiving device. Status information.
  • the management server generates predicted state information and table information, where the predicted state information is information for predicting the state of the receiving device, and the table information is recommended information corresponding to the predicted state of the predicted state information and the predicted state Establish the corresponding information. That is, in the table information, the predicted state of the predicted state information is associated with the recommended information corresponding to the predicted state.
  • control unit determines whether the status information from the merchandise sales management server matches the current status information from the receiving device.
  • the control unit determines in the process of S3034 that the status information from the merchandise sales management server matches the current status information from the receiving device, in the process of S3035, the recommended information is displayed on the display unit, and the process ends.
  • control unit determines whether there is any other recommendation information that matches the current status information.
  • Other recommended information S3037.
  • the control unit determines that there is other recommended information that matches the current status information in the other recommended information (Yes in S3037), it determines the other recommended information that matches the current status information.
  • the recommendation information is displayed on the display unit (S3038), and the process ends.
  • the control unit determines that there is no other recommendation information that matches the current status information among the other recommendation information (No in S3037), the process ends.
  • the recommendation information is not displayed on the display unit of the terminal device, It is also possible to display other recommended information that matches the current state of the receiving device.
  • the receiving organization becomes not to send status information to the management server.
  • the execution of the display control program is notified to the merchandise sales management server of the management server via the Internet.
  • the merchandise sales management server When the merchandise sales management server is notified that the display control program is being executed, it generates prediction status information and transmits the table information to the terminal device.
  • the prediction status information and table information please refer to the priority of application number JP2020-019997.
  • the channel of the receiver is correlated with the time. It should be noted that, as far as the predicted state information is concerned, although the channel of the receiver is associated with the time, it is not limited to this. It is also possible to associate a specific recorded program with the time counted from the beginning of the specific recorded program. Establish correspondence. In the table information, the predicted state of the predicted state information is associated with the recommended information corresponding to the predicted state.
  • the terminal device When the terminal device receives the predicted state information and the table information from the merchandise sales management server, it inquires the current state of the receiver. The terminal device determines whether the current state information of the receiver matches the predicted state of the predicted state information. When the terminal device determines that the current state information of the receiver and the predicted state information are at least partially coincident, the terminal device acquires recommendation information corresponding to the predicted state from the table information and displays it on the display unit.
  • the terminal device acquires the predicted state information and the table information, and acquires the state information indicating the state of the receiver at the time period corresponding to the acquisition of the predicted state information and the table information.
  • the terminal device compares the predicted state information with the state information, and when at least a part of the predicted state information and the state information match, obtains the recommended information corresponding to the predicted state from the table information and executes the display processing of the recommended information.
  • the terminal device determines that the current status information of the receiver matches the predicted status.
  • the terminal device acquires recommendation information corresponding to the predicted state from the table information and displays the recommended information on the display unit.
  • the control unit receives prediction status information and table information from the merchandise sales management server.
  • the control unit inquires the receiver about the current state, and receives the current state information from the receiver.
  • the control unit determines whether the current state information of the receiver matches the predicted state of the predicted state information. When the control unit determines that the current state information of the receiver does not match the predicted state of the predicted state information, the process ends. That is, when the current state information of the receiver does not match the predicted state of the predicted state information, the control section ends the process without displaying anything on the display section.
  • control unit determines that the current state information of the receiver matches the predicted state of the predicted state information, it acquires the recommended information corresponding to the predicted state from the table information and displays it on the display section, and ends the process .
  • the information communication system of the present embodiment can display recommendation information that matches the current status of the receiver with high accuracy even without sending status information from the receiver to the management server.
  • the terminal device may periodically inquire the current status of the receiver, and display the recommendation information on the display unit when the current status of the receiver matches the predicted status.
  • the terminal device periodically inquires the current status of the receiver, and when it is determined that the user uses the receiver to watch the program of channel CH4 from 9:30 to 10:30, the status is obtained and predicted from the table information The corresponding recommendation information is displayed on the display unit.
  • the terminal device periodically inquires the current state of the receiver, and the receiver may periodically transmit its current state to the terminal device.
  • the smart device 300 may also have the following mechanism: immediately turn on the voice detection function, and connect the picked-up voice signal to the picked-up voice signal.
  • the voice detection time data at the time is stored in the memory as a "voice command”.
  • it has a mechanism for sending the "voice command" to the server 400.
  • the receiving device 100 may have at least the following mechanism: specify the time data of the scene indicating the time position of the scene of the image when the scene specifying signal is received, including the The content information (program information, etc.) of the content of the scene is recorded in the information recording unit as "scene designation data"; the "scene designation data" is transmitted to the server 400.
  • the remote controller 200 transmits the scene designation signals (S2a) and (S2b) to the receiving device 100 and the smart device 300, respectively.
  • the receiving device 100 receives the scene designation signal (S2a) from the remote controller, and at least includes the "scene designation time data” indicating the time position of the scene of the image when the scene designation signal (S2a) is received.
  • the "content information (for example, program information)” and “TV identification information” of the content of the scene are recorded in the information recording unit as “scene designation data”.
  • the “scene specifying data” is sent to the server 400.
  • the smart device 300 receives the scene designation signal (S2b) from the remote controller 200, and turns on the “voice signal” picked up by the voice detection function, the “voice detection time data” at the time of pickup, and the “smart speaker recognition” Information” is stored in the memory as "voice command information”. And, the “voice command information” is sent to the server 400.
  • the "scene designation time data” and “voice detection time data” may also be when the “scene designation data” and “voice command information” are sent to the server 400.
  • these time data are referred to as “combined collation data” for pairing "scene designation data” and "voice command information”.
  • the scene designation data and the voice command information are linked, and are stored in the memory as a database.
  • the database is parsed and processed for various purposes.
  • reference information for linking approximate time data of "scene designation time data" and "voice detection time data” is used.
  • FIG. 5 is a sequence diagram showing an example of the operation of the system according to the first embodiment, showing an example of the operation of the above-mentioned voice information processing system along the passage of time.
  • (2a) of FIG. 5 shows the elapse of real time.
  • (2b) shows the passage of the program scene on the screen of the receiving device 100.
  • (2c) is the elapse of time in the remote controller 200, and indicates that the scene designation button 201 is operated at time t1.
  • (2d) is the time elapsed in the smart device 300, which indicates that the voice detection function is turned on at time t1, and the voice is picked up from the microphone.
  • the voice detection function is turned on, when it is turned off, the surrounding voice is interrupted at a certain level or more, or after a predetermined time has elapsed (30 seconds or 2 minutes, etc.,...), it can also be set arbitrarily by the user.
  • (2e) is the elapsed time in the receiving device 100, and represents the time period during which the receiving device 100 generates scene designation data and sends it to the server for processing.
  • (2f) is the time elapsed in the server 400.
  • the server 400 receives "scene designation data" and "voice command information" from the receiving device 100 and the smart device 300, links the scenario designation data with the voice command information, and stores it in the memory as a database.
  • the server 400 performs analysis processing on the database for various purposes, or returns the analysis result to each receiving device 100 and/or smart device 300.
  • the above example shows an example of collection of information for a program that is broadcast in real time, but even when a broadcast program is temporarily recorded in a recording and playback device and the program is broadcasted, the above-mentioned way of thinking can be applied.
  • the elapsed time from the start time of the program in the time information is used as the previous time t1.
  • the program information (program name, etc.) included in the scenario designation data is added with identification information (or attribute information) that is a broadcast program.
  • the “scene designation data” and the “voice command information” are sent to the cloud server, as the reference time information for linking the two, “real-time time information” is added and sent.
  • FIG. 6 is a flowchart showing an example of the operation of the system according to this embodiment, and showing an example of the operation in the case where the remote controller 200 shown in FIGS. 1 and 2 has a start operation of the scene designation mode.
  • the scene designation button 201 may double as a button for starting the scene designation mode, or there may be a scene designation mode start button for predetermining a predetermined action mode in advance.
  • the remote controller 200 is set to be activated in the scene specifying mode (SA1). Also, suppose that the user 5 is watching a program while watching the screen. Here, it is assumed that there is a fancy scene, for example. The user now operates the scene specifying button 201 (SA2). In this way, in the receiving device 100, the communication control unit 162 and the system control unit 161 operate together. At least the time data and program information (channel, program name, etc.) of the current scene are temporarily stored as "scene designation data" in the scene information storage unit (SA3) of the memory in the system control unit 161.
  • SA3 scene information storage unit
  • the "scene designation data" stored in the scene information storage unit and the "TV identification information” for identifying the receiving device 100 are integrated, and transmitted to the cloud server via the network I/F 162-1 (SA5).
  • the TV identification information is included in the scene specifying data, it does not need to be integrated.
  • the interface unit 314 receives the scene designation signal. In this way, the system controller 301 turns on the speaker 313 to enable voice input (SA7).
  • the voice is collected, and the voice data and the time data at the time of the collection are stored in the memory (RAM303) (SA8) as "voice command information".
  • the “TV identification information” and or “remote control identification information” of the receiving device 100 corresponding to the “voice command information” are sent to the server 400.
  • the “speaker identification information” and or the “remote control identification information” of the smart device 300 may also be sent to the server 400.
  • the “speaker identification information” is already included in the “voice command information”, there is no need to additionally add the “speaker identification information” at the time of the above-mentioned transmission.
  • the above steps SA3, SA3-SA5, and SA6 can be described as functional modules in the system control unit 161 in the receiving device 100.
  • the aforementioned steps SA3, SA7-SA9, and SA6 can be described as functional modules in the system controller 301 in the smart device 300.
  • the scene designation data and voice commands sent from a plurality of television receivers and smart speakers are temporarily taken into (buffered) the storage unit 423 under the control of the system controller 422.
  • the analysis unit 424 analyzes the received data taken into the storage unit 423, and first, organizes the data for each program.
  • the analysis unit 424 in this embodiment retrieves the database based on the scene determination information and instructions received from the receiving device 100 and the smart device 300, obtains relevant provision information (wherein the relevant provision information is associated with the scene determination information), and outputs it to The receiving device 100 and the smart device 300.
  • FIG. 7 is a diagram showing an example of the operation of the system according to this embodiment, showing the operation of the server 400 when the "scene designation data" is transmitted from the receiving device 100 to the server 400, and the "voice command information" is transmitted from the smart device 300 example.
  • the server 400 temporarily stores the previous "scene designation data" in the buffer 423a, and temporarily stores the "voice command information" in the buffer 423b. "Scene designation data” and “voice command information” are also sent one by one from different TV devices and smart speakers.
  • the combination engine 424a combines mutually corresponding "scene designation data” and “voice command information” based on the combination collation data, and stores the set of "scene designation data” and “voice command information” in the pairing storage unit 423c.
  • the "voice command information" stored in the pairing storage unit 423c is analyzed in the command analysis unit 424b, and the content of the voice command is grasped.
  • the voice command is the command for TV control (for example, “stop”, “rewind”, “fast forward”, “skip”, “mark” to set the screen to black, “power off” “Open”, etc.), or instructions for acquiring information related to the image scene (for example, “Where is the shooting location of the current scene?", “Where is the location?", “Who is the current person?", “Now What is the manufacturer of your car?”, “What is the current car type?”, "Where is the hotel", “Where is the hotel?", “Where is the manufacturer?” etc.).
  • the control command is prepared by the buffer 423e, and is sent to the corresponding receiving device 100 for TV control.
  • the command is used to read information corresponding to the command from the program element information storage unit 423h, and prepare it in the buffer 423g.
  • the information corresponding to the instruction there are, for example, the name of the director, the name of the manufacturer, the prologue of the actors, and tourist attractions.
  • Such information is sent to the smart device 300 as voice information, for example, as voice response information.
  • voice response information picture-in-picture (PIP) image data may be sent.
  • PIP picture-in-picture
  • the program element information storage unit 423h stores accumulation information in which the server 400 itself collects and accumulates related information from program information and various media information. In addition, within the accumulation information, viewing history and the like are also collected and accumulated from each television receiving device.
  • the voice information processing system has: a television device outputting an image receives a scene designation signal from a remote controller, and at least scene designation time data indicating the time position of the scene of the image when the scene designation signal is received, including all The content information of the content of the scene is recorded as "scene designation data" in the information recording department; and
  • the smart device with at least the function of picking up voice is equipped with: receiving the scene designation signal from the remote controller, and using the voice signal that the voice detection function is turned on and picked up and the voice detection time data at the time of pickup as "voice command information "And a mechanism that is stored in the memory; and a mechanism that sends the "voice command information" to the cloud server.
  • the television device described in (1) above includes a mechanism for storing an image of the scene of the image. In this way, the user can confirm the stored scene later, and execute a voice command on the stored scene.
  • the television device includes a mechanism for displaying an image of the scene of the image on a small screen for a certain period of time. As a result, the user can visually observe interesting scenes and give voice commands.
  • the television device in any one of the above (1) to (3), has the function of receiving the instruction contained in the "voice instruction information" sent from the cloud server, and performing the instruction corresponding to the instruction.
  • Control mechanism system control unit 161 for operation control. In this way, the user can save interesting scenes, repeat playback (still image playback), and the like. In addition, editing processes such as chapter settings for the scene are also easy to perform.
  • the smart device receives the "voice command information" acquired by the cloud server based on the instruction contained in the "voice instruction information” sent to the cloud server Voice data", the voice corresponding to the "voice data” is output from the speaker.
  • the general smart device 300 needs to receive a trigger word before receiving a voice command.
  • the user may not be able to quickly specify the scene of the moment of interest. That is, even if a voice command is issued at the moment of interest, the normal smart device 300 executes the command by receiving a trigger word and a voice command, and after the command is retrieved through voice recognition, the scene in which the command is executed becomes a scene that is later than the scene of interest.
  • the smart device 300 generates a voice command after the user specifies a scene, and can execute the voice command on the specified scene.
  • a user who watches a program image of a television broadcast may wish to know further relevant information for the image scene displayed all the time.
  • the so-called related information is, for example, information on the name of the performer appearing in the image scene and the location of the scenery (for example, the name of the area, the address, etc.).
  • the relevant information can be obtained through voice commands.
  • the server 400 can recognize the state of the smart device 300, and can appropriately process instructions from the smart device 300.
  • FIG. 8 is a sequence diagram of the system according to the second embodiment, showing the interaction of data and the like between the user 5 and the receiving device 100, the server 400, and the smart device 300, and the processing flow of each function.
  • Step S51 When the user 5 is watching a travel program through the receiving device 100, he sees a scene of a handsome car driving on a very beautiful grassland, and thinks "want to know where the place is” and “want to know the manufacturer of the car". The user 5 presses the scene designation button 201 of the remote controller 200 at the moment when the user 5 sees the scene. (Step S51).
  • the system control unit 161 when the system control unit 161 receives the scene designation signal output from the remote control 200 via the remote control I/F162-2 (step S101), it acquires the information output to the display 170, at the timing of receiving the scene designation signal.
  • the scene designation time data may be, for example, the absolute time when the scene is displayed, or the counted time (relative time) from the start of the content until the scene is displayed.
  • the scene designation time data may be acquired through a clock and a counter provided in the receiving device 100, or may be acquired from program information of a broadcast signal or the like.
  • the system control unit 161 acquires viewing content information related to the output content.
  • the system control unit 161 includes viewing content information and scene designation time data, and generates scene identification information (step S102).
  • the system control unit 161 transmits the generated scene identification information from the network I/F 162-2 to the server 400 via the network 500 (step S103).
  • the server 400 receives the scene specifying information transmitted from the system control unit 161, and stores it in the storage unit 423 (step S131).
  • the system control unit 161 outputs an activation signal for the smart device 300 to activate the voice command acquisition function from the network I/F 162-2 to the network 500 (step S104).
  • the activation signal is temporarily received by the server 400, it is transferred to the smart device 300 via the network 500 (steps S132, S141).
  • the server 400 can manage the state of the smart device 300.
  • the receiving device 100 explicitly transmits the activation signal to the smart device 300, but the scene identification information output in step S103 may be used as the activation signal.
  • step S132 when the system controller 422 receives the start signal from the receiving device 100, it changes the mode of data processing (steps S132, S133). With this mode change, the command received later becomes the mode executed for the scene specifying information received in step S131 (step S133).
  • step S133 the mode change is explicitly indicated. For example, after the system controller 422 receives the scene determination information and the start signal through steps S131 and S132, if it is determined that the command received later is to execute the scene determination information In particular, there may be no step S133.
  • step S142 if the system controller 422 receives the activation signal, it activates the voice command acquisition function of the voice recognition unit 310 (step S142). In addition, it indicates that the mode change is simultaneously performed in step S142, but the operation of the smart device 300 is changed from the normal processing operation. In the normal operation (normal mode) of the smart device 300, the voice command acquisition function is activated after the trigger word is received. However, in this embodiment, the smart device 300 sets the activation signal as a trigger to activate the voice command acquisition function. In addition, in step S141, after the system controller 422 receives the start signal, it is sufficient to start the voice command acquisition function, so in particular, there may be no mode change action in step S142.
  • the smart device 300 may also notify the user of the activation of the voice command acquisition function by voice from the speaker 313 (step S143).
  • the user can recognize that the voice command can be issued by hearing the meaning that the voice command acquisition function is effective from the speaker 313 (step S52).
  • the user 5 can issue a voice command to the scene designated from the remote controller 200.
  • FIG. 9 is a diagram showing an example of the data flow in the system according to this embodiment, and shows the data flow in the system until the specified scene can be given a voice command after the user 5 specifies the scene of the content being watched.
  • the user 5 presses the scene designation button 201 of the remote controller 200 (data line L201, corresponding to step S51 in FIG. 8).
  • the remote controller 200 outputs a scene designation signal, and the receiving device 100 receives it (data line L202, corresponding to steps S103 and S131 in FIG. 8).
  • the receiving device 100 outputs scene determination information and a start signal, and the server 400 receives them via the network 500 (data lines L203 and L204, corresponding to steps S103, S131, S104, and S132 in FIG. 8).
  • the server 400 outputs the activation signal, and the smart device 300 receives the activation signal via the network 500 (data lines L205 and L206, corresponding to steps S132 and S141 in FIG. 8).
  • the smart device 300 outputs a voice notification indicating that the voice command acquisition function is activated (data line L207, corresponding to steps S142 and S52 in FIG. 8).
  • a voice notification indicating that the voice command acquisition function is activated
  • the user 5 issues a voice command to the smart device 300 (data line L208, equivalent to step S53 in FIG. 8).
  • the user 5 issues a voice command (step S53).
  • a voice command For example, the phrase "I want to know where this place is" (voice instruction) is issued.
  • the voice recognition unit 310 performs voice recognition on the received voice (steps S144, S145).
  • the case where the voice recognition unit 310 is installed in the smart device 300 is shown, and an external voice recognition device or the like on the network 500 may also be used.
  • the voice recognition unit 310 acquires a command (command) superimposed on the voice command based on the text data obtained by the voice recognition (step S146).
  • the command can be obtained, for example, by the smart device 200 sending text data to an external text conversion device not shown, and the text conversion device may convert the command and return it to the smart device 200.
  • the smart device 200 sends the acquired instruction to the server 400, and the server 400 receives the instruction (step S134).
  • the text conversion device in step S146 may also be in the receiving device 100.
  • the sending of the instruction in step S147 to the server 400 is performed by the receiving device 100.
  • the text conversion device in step S146 may also be located in the server 400. In this case, the server 400 itself may manage the instructions.
  • the server 400 generates content-related information based on the scene specifying information stored in the storage unit 423 in step S131 and the command received in step S134 (step S135). Specifically, the server 400 determines a scene according to the scene determination information, and performs processing related to the received instruction on the determined scene to obtain content-related information.
  • the so-called content-related information is the result of the instruction for the determined scene, and becomes the response to the voice instruction issued by the user. For example, as a response to the voice command "I want to know where the place is", content-related information "It is Yatsugatake in Nagano Prefecture" is generated.
  • the content-related information is sent to the receiving device 100 and the smart device 300 as needed. When the receiving device 100 receives content-related information, for example, the content-related information may be displayed on the screen as text information. In a case where the smart device 300 receives content-related information, for example, the content-related information may also be sent out by voice.
  • step S145 When the smart device 300 continues to receive the next voice command, it returns to step S145 again, performs voice recognition, generates content-related information, and sends the content-related information to the receiving device 100 and the smart device 300 (Yes in step S149). For example, after the first voice command, when keywords such as "further” and “continue” are acquired by voice recognition, it is determined that the next voice command is coming, and the processing from step S145 is repeated again.
  • the smart device 300 for example, if the next voice command does not arrive for a certain period of time or longer, it terminates the acquisition of the voice command for the scene specifying information stored in the storage unit 423 in step S131, and returns to the normal mode. (Yes in step S149, S150).
  • the smart device 300 returns to the normal mode, it notifies the server 400 of this.
  • the server 400 recognizes that the smart device 300 has returned to the normal mode, it returns its own mode to the mode before receiving the scene determination information and the activation signal in steps S131 and S132 (step S138).
  • the user 5 can execute a voice command on the scene designated from the remote controller 200. For example, if the user 5 is watching a program through the receiving device 100 and thinks about "want to know where the place is” and “want to know the manufacturer of the car” for the objects and scenery that appear, he will press the scene designation from the remote control 200 After the button 201 is pressed, voice commands such as "I want to know where the place is", “I want to know the manufacturer of the car", etc. are generated to the smart device 300.
  • information that the user wants to know about the scene of interest such as the answer "This is Okutama", the WWW (World Wide Web) of the car manufacturer, etc.
  • the device 300 outputs.
  • FIG. 10 is a diagram showing a first data flow example of the system according to the modification, and shows the flow of data in the system until a voice command can be given to the designated scene after the user 5 designates the scene of the content being watched.
  • the user 5 presses the scene specifying button 201 (data line L301) of the remote controller 200.
  • the remote controller 200 outputs a scene designation signal, and the receiving device 100 receives it (data line L302).
  • the receiving device 100 outputs an activation signal from the smart device I/F 162-3, and the smart device 300 receives it through the interface unit 314 (data line L303).
  • the receiving device 100 uses the scene designation signal from the remote controller 200 as a trigger to acquire the scene determination information, and output the scene determination information to the server 400 via the network 500 (data lines L304, L305).
  • the smart device 300 uses the reception of the activation signal as a trigger, activates the voice command acquisition function, and outputs a voice such as "Able to receive voice commands". (Data line L306). If the user 5 hears the voice notification that the voice command acquisition function is activated, the user 5 issues a voice command to the smart device 300 (data line L307).
  • FIG. 11 is a diagram showing a second data flow example of the system according to the modification, and shows the flow of data in the system after the user 5 designates the scene of the content being watched until the designated scene can be given a voice command.
  • the user 5 presses the scene specifying button 201 (data line L401) of the remote controller 200.
  • the remote controller 200 outputs a scene designation signal
  • the remote controller I/F162-2 of the receiving device 100 receives it (data line L402).
  • the remote controller I/F162-2 outputs a scene designation signal (data line L403) to the system control unit 161.
  • the system control unit 161 outputs an activation signal (data line L404) to the smart device 300 based on the scene designation signal.
  • the receiving device 100 uses the scene specifying signal from the remote controller 200 as a trigger to acquire scene specifying information, and output the scene specifying information to the server 400 via the network 500 (data lines L405, L406).
  • the smart device 300 uses the reception of the activation signal as a trigger, activates the voice command acquisition function, and outputs a voice such as "Able to receive voice commands". (Data line L407). If the user 5 hears the voice notification indicating the meaning of activating the voice command acquisition function, the user 5 issues a voice command to the smart device 300 (data line L408).
  • FIG. 12 is a diagram showing a third data flow example of the system according to the modification, showing the flow of data in the system after the user 5 specifies the scene of the content being watched until the specified scene can be given a voice command .
  • the data flow in this modification example corresponds to the data flow in the first embodiment.
  • the user 5 presses the scene specifying button 201 (data line L101) of the remote controller 200.
  • the remote controller 200 outputs a scene designation signal, and the receiving device 100 receives it (data line L102).
  • the smart device 300 also receives the scene designation signal (data line L103) output by the remote controller 200.
  • the receiving device 100 acquires the scene determination information using the scene designation signal from the remote controller 200 as a trigger, and outputs the scene determination information to the server 400 via the network 500 (data lines L104, L105).
  • the smart device 300 uses the reception of the scene designation signal in the data line L103 as a trigger, starts the voice command acquisition function, and outputs voices such as "can receive voice commands". (Data line L106).
  • the user 5 hears the voice notification that the voice command acquisition function is activated, the user 5 speaks a voice command to the smart device 300 (data line L107).
  • the user 5 can issue a voice command to the scene designated from the remote controller 200.
  • the embodiment of the present application provides a non-volatile storage medium on which a program or computer instruction is stored.
  • a computer device for example, the aforementioned server, receiver, terminal device, etc.
  • the above method may be specific storage modules or devices on the aforementioned servers, receivers (for example, televisions or video recorders, etc.) and terminal devices (for example, smart phones or tablet computers, etc.). These modules or When the device runs a computer instruction or a program, the method described in each of the above embodiments is implemented.
  • the drawings may schematically show the width, thickness, shape, etc. of each part compared to the actual appearance.
  • data and signals may be exchanged between unconnected frames or, or even if they are connected but not showing the direction of the arrow.
  • the functions shown in the block diagram, the flowcharts, and the processing shown in the sequence diagram can also be implemented through hardware (IC chips, etc.), software (programs, etc.), digital signal processing arithmetic chips (Digital Signal Processor, DSP), or these hardware It can be realized in combination with software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

L'invention concerne un dispositif permettant de recevoir une instruction de parole pour exécuter un traitement sur une scène d'image désignée par un utilisateur, un serveur, et un système de traitement d'informations de parole. Le dispositif de réception comprend : un mécanisme de réception de signal de commande, qui reçoit, lorsque le contenu d'image est délivré par un mécanisme d'affichage, un signal de commande pour désigner une scène d'une image en tant que contenu d'image, c'est-à-dire un signal de désignation de scène ; et un mécanisme de commande, qui reçoit de la parole, exécute une reconnaissance vocale sur la parole, et génère une commande de démarrage pour démarrer l'acquisition d'instruction pour un mécanisme d'acquisition d'instruction de parole qui acquiert des instructions.
PCT/CN2021/075126 2020-02-07 2021-02-03 Dispositif de réception, serveur, et système de traitement d'informations de parole WO2021155812A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202180001659.7A CN113498538A (zh) 2020-02-07 2021-02-03 接收装置、服务器以及语音信息处理系统

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
JP2020019997A JP7181907B2 (ja) 2020-02-07 2020-02-07 情報通信システム、受信機、端末装置、表示制御方法、表示制御プログラム
JP2020-019686 2020-02-07
JP2020019686A JP7272976B2 (ja) 2020-02-07 2020-02-07 シーン情報提供システム及び受信装置
JP2020-019997 2020-02-07
JP2020-155675 2020-09-16
JP2020155675A JP7463242B2 (ja) 2020-09-16 2020-09-16 受信装置、サーバ及び音声情報処理システム

Publications (1)

Publication Number Publication Date
WO2021155812A1 true WO2021155812A1 (fr) 2021-08-12

Family

ID=77199664

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/075126 WO2021155812A1 (fr) 2020-02-07 2021-02-03 Dispositif de réception, serveur, et système de traitement d'informations de parole

Country Status (2)

Country Link
CN (1) CN113498538A (fr)
WO (1) WO2021155812A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986803A (zh) * 2018-06-26 2018-12-11 北京小米移动软件有限公司 场景控制方法及装置、电子设备、可读存储介质
CN109727597A (zh) * 2019-01-08 2019-05-07 未来电视有限公司 语音信息的交互辅助方法和装置
CN110430465A (zh) * 2019-07-15 2019-11-08 深圳创维-Rgb电子有限公司 一种基于智能语音识别的学习方法、终端及存储介质
US20190387240A1 (en) * 2017-01-05 2019-12-19 Alcatel-Lucent Usa Inc. Compressive sensing with joint signal compression and quality control
CN110719441A (zh) * 2019-09-30 2020-01-21 傅程宏 一种用于银行人员行为合规预警管理的系统及方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1735174A (zh) * 2004-08-02 2006-02-15 上海乐金广电电子有限公司 用于以内容为基础的特技播放服务方法和系统
JP2014027635A (ja) * 2012-07-30 2014-02-06 Sharp Corp 携帯端末装置および情報通信システム
CN108062212A (zh) * 2016-11-08 2018-05-22 沈阳美行科技有限公司 一种基于场景的语音操作方法及装置
KR20180110974A (ko) * 2017-03-30 2018-10-11 엘지전자 주식회사 음성 서버, 음성 인식 서버 시스템 및 그 동작 방법
JP7026449B2 (ja) * 2017-04-21 2022-02-28 ソニーグループ株式会社 情報処理装置、受信装置、及び情報処理方法
CN109089140A (zh) * 2017-06-14 2018-12-25 北京优朋普乐科技有限公司 一种语音控制方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190387240A1 (en) * 2017-01-05 2019-12-19 Alcatel-Lucent Usa Inc. Compressive sensing with joint signal compression and quality control
CN108986803A (zh) * 2018-06-26 2018-12-11 北京小米移动软件有限公司 场景控制方法及装置、电子设备、可读存储介质
CN109727597A (zh) * 2019-01-08 2019-05-07 未来电视有限公司 语音信息的交互辅助方法和装置
CN110430465A (zh) * 2019-07-15 2019-11-08 深圳创维-Rgb电子有限公司 一种基于智能语音识别的学习方法、终端及存储介质
CN110719441A (zh) * 2019-09-30 2020-01-21 傅程宏 一种用于银行人员行为合规预警管理的系统及方法

Also Published As

Publication number Publication date
CN113498538A (zh) 2021-10-12

Similar Documents

Publication Publication Date Title
KR101763887B1 (ko) 디바이스간 동기화된 인터랙션을 제공하는 콘텐츠 동기화 장치 및 방법
JP5395813B2 (ja) コンテンツおよびメタデータの消費技法
JP5155194B2 (ja) 推薦された録画およびダウンロードのガイド
KR101010378B1 (ko) 텔레비전 수신 장치
US8554884B2 (en) Setting and modifying method of user operating interface for use in digital audio/video playback system
JP2013017172A (ja) 放送ストリーム受信装置及び方法
KR101873793B1 (ko) 디바이스간 동기화된 인터랙션을 제공하는 콘텐츠 동기화 장치 및 방법
WO2010066189A1 (fr) Procédé et dispositif de navigation rapide dans des programmes
JP2008527788A (ja) 放送コンテンツ情報提供方法およびそのシステム
KR20160039830A (ko) 멀티미디어 장치 및 그의 음성 가이드 제공방법
US20090147140A1 (en) Image apparatus for processing plurality of images and control method thereof
US20150334439A1 (en) Method and system for displaying event messages related to subscribed video channels
WO2021155812A1 (fr) Dispositif de réception, serveur, et système de traitement d'informations de parole
US20160192022A1 (en) Electronic device, method, and storage medium
CN102595232A (zh) 数字电视节目相关信息搜索方法及数字电视接收终端
WO2021009989A1 (fr) Dispositif et procédé de traitement d'informations d'intelligence artificielle, et dispositif d'affichage doué d'une fonction d'intelligence artificielle
KR20230029438A (ko) 디스플레이 장치 및 디스플레이 장치의 제어 방법
US20090013355A1 (en) Broadcast scheduling method and broadcast receiving apparatus using the same
JP7463242B2 (ja) 受信装置、サーバ及び音声情報処理システム
JP2003348560A (ja) 放送開始告知サービスを実現する放送方法および放送端末装置
US20230209102A1 (en) Electronic device and operating method therefor
JP2015115802A (ja) 電子機器、方法及びコンピュータ読み取り可能な記録媒体
WO2021109839A1 (fr) Appareil et procédé de commande d'instruction et support de stockage non volatil
CN114766054B (zh) 接收设备以及生成方法
JP7207307B2 (ja) 情報処理装置、情報処理方法、プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21750015

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21750015

Country of ref document: EP

Kind code of ref document: A1