WO2020090790A1

WO2020090790A1 - Information processing device

Info

Publication number: WO2020090790A1
Application number: PCT/JP2019/042300
Authority: WO
Inventors: 田中　彰; 翔七尾; 広樹石塚; 昇悟池田; 充弘小形; 誠村▲崎▼
Original assignee: 株式会社Ｎｔｔドコモ
Priority date: 2018-10-30
Filing date: 2019-10-29
Publication date: 2020-05-07
Also published as: JPWO2020090790A1

Abstract

Provided is a user device comprising: a first keyword generation unit which generates a first keyword on the basis of the voice of a user; a second keyword generation unit which generates a plurality of second keywords corresponding one-on-one to a plurality of object images extracted from an image indicated by an image signal; an identifying unit which, on the basis of the level of relevance between each of the plurality of second keywords and the first keyword, identifies a target keyword as a target for a comment from among the plurality of second keywords generated by the second keyword generation unit; and a comment generation unit which generates a comment related to the target keyword.

Description

Information processing equipment

The present invention relates to an information processing device.

Patent Document 1 discloses a technique in which when a user specifies an object image in an image displayed on a display device using a pointing device, a recommendation regarding the object image is displayed. Further, Patent Document 2 discloses a technique of recommending information about an object image designated by a voice of the user in response to the voice of the user.

JP, 2017-228177, A JP, 2013-88906, A

However, in the conventional technology, the user needs to specify the object image. That is, when the user does not specify the object image, it is not possible to generate a comment such as a recommendation in response to the user's ambiguous statement.

In order to solve the above problems, an information processing apparatus according to a preferred aspect of the present invention includes a first keyword generating unit that generates a first keyword based on a voice of a user, and a plurality of units extracted from an image indicated by an image signal. Second keyword generation unit that generates a plurality of second keywords corresponding to the object image of 1: 1, and the plurality of second keywords based on the degree of association between each of the plurality of second keywords and the first keyword. And a comment generation unit that generates a comment related to the target keyword.

According to the information processing apparatus of the present invention, a comment can be generated in response to a user's ambiguous statement without the user designating an object image to be a comment target.

It is a block diagram showing the whole service system composition concerning a 1st embodiment of the present invention. It is a block diagram which illustrates the hardware constitutions of the user apparatus used for the same embodiment. It is explanatory drawing which shows the data structure of the keyword table used for the same embodiment. It is a functional block diagram which shows the function of the user apparatus used for the same embodiment. It is explanatory drawing which shows an example of the object image of the same embodiment. It is a flowchart which shows operation | movement of the user apparatus used for the same embodiment. It is a functional block diagram which shows the function of the user apparatus used for 2nd Embodiment. It is a flowchart which shows operation | movement of the user apparatus used for the same embodiment. It is an explanatory view for explaining an evaluation result of an object image in the embodiment. It is a functional block diagram which shows the function of the user apparatus used for 3rd Embodiment. It is a flowchart which shows operation | movement of the user apparatus used for the same embodiment. It is a functional block diagram which shows the function of the user apparatus used for 4th Embodiment. It is explanatory drawing which shows the memory content of the evaluation table used for the same embodiment. It is a flowchart which shows operation | movement of the user apparatus used for the same embodiment.

[1. First Embodiment]
[1.1. Service system configuration]
FIG. 1 is a block diagram showing the overall configuration of a service system according to the first embodiment of the present invention. The service system 1 shown in FIG. 1 provides a moving image distribution service. The moving image distribution service provides, for example, movies or terrestrial digital broadcasting contents.

As illustrated in FIG. 1, the service system 1 includes user devices 20_1 to 20_m (m is an integer of 1 or more) managed by users U_1 to U_m, a network NW, and a video distribution server 10. In the following description, when the same type of element is not distinguished, only the common number among the reference numerals is used like the user device 20 or the user U.

The user device 20 is an information processing device that processes various types of information. The user device 20 is, for example, a portable information processing device such as a smartphone or a tablet terminal. However, as the user device 20, any information processing device can be adopted. The user device 20 may be, for example, a terminal-type information device such as a personal computer.

The user device 20 receives the image signal Sg transmitted from the moving image distribution server 10 and displays the image, or transmits the image signal Sg to the television receiver 30 and displays the image on the television receiver 30. Can be made
The user U may speak while watching a moving image. For example, the user U may make a comment or tweet about the moving image. In this case, although the statement of the user U is related to the moving image, it is not possible to uniquely specify which object included in the image of the moving picture the statement is related to because the statement is ambiguous. There are many. The user device 20 has a function of generating a comment such as a recommendation in response to the vague statement of the user U.

[1.2. Configuration of user device]
FIG. 2 is a block diagram illustrating a hardware configuration of the user device 20. The user device 20 is a computer system including a processing device 21, a storage device 22, a communication device 23, an output device 24, an input device 25, a short-range wireless communication device 26, a GPS (Global Positioning System) device 27, and a bus 28. Will be realized. The processing device 21, the storage device 22, the communication device 23, the output device 24, the input device 25, the short-range wireless communication device 26, and the GPS device 27 are connected by a bus 28 for communicating information. The bus 28 may be configured by a single bus or may be configured by different buses among devices. Note that each element of the user device 20 is configured by a single device or a plurality of devices, and some elements of the user device 20 may be omitted.

The processing device 21 is a processor that controls the entire user device 20, and is composed of, for example, a single chip or a plurality of chips. The processing device 21 is configured by, for example, a central processing unit (CPU) including an interface with peripheral devices, an arithmetic device, a register, and the like. Note that some or all of the functions of the processing device 21 are realized by hardware such as DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), PLD (Programmable Logic Device), and FPGA (Field Programmable Gate Array). May be. The processing device 21 executes various processes in parallel or sequentially.

The storage device 22 is a recording medium that can be read by the processing device 21. The storage device 22 stores a plurality of programs including the control program PRa executed by the processing device 21, a keyword table TBLa, a comment table TBLb, and various data used by the processing device 21. The storage device 22 is composed of one or more types of storage circuits such as a ROM (Read Only Memory), an EPROM (Erasable Programmable ROM), an EEPROM (Electrically Erasable Programmable ROM), and a RAM (Random Access Memory).

A plurality of words are stored in the keyword table TBLa. Multiple words are roughly divided into nouns and adjectives. Noun words correspond to keywords. The first keyword KW1 and the second keyword KW2 described later are included in the noun words stored in the keyword table TBLa. The adjective word is stored in association with the noun word. Adjectives have the function of modifying nouns. The association between the adjective word and the noun word is determined according to the word modification relation. For example, the adjective word “delicious” is associated with the noun word “food and drink”.

FIG. 3 is an explanatory diagram showing a data structure of a noun word stored in the keyword table TBLa. As shown in the figure, the data structure of the noun word has a tree structure in which a plurality of words are hierarchized according to the meaning. In this example, a plurality of words are classified into the first hierarchy to the fourth hierarchy. The number of layers may be four or more.

Further, the keyword table TBLa stores the degree of association indicating the degree of association between the noun word and the noun word. The degree of relevance is determined in consideration of the relationship between the superordinate concept and the subordinate concept, and the use and function of the object indicated by the word. For example, “sake” and “wine” are both subordinate concepts of “sake”. On the other hand, although "Ouchiguchi" is not a subordinate concept of "Sake", "Inoguchi" is used to drink "Japanese sake". For this reason, the degree of association between “sake” and “chef” is higher than the degree of association between “sake” and “wine”.

Return the explanation to Figure 2. The communication device 23 is a device that communicates with another device via a network NW such as a mobile communication network or the Internet. The communication device 23 is also described as, for example, a network device, a network controller, a network card, or a communication module. The communication device 23 can communicate with the moving image distribution server 10 via the network NW.

The output device 24 informs the user U of various information under the control of the processing device 21. The output device 24 includes a display device 241 and a speaker 242. The display device 241 displays an image. For example, various display panels such as a liquid crystal display panel or an organic EL (Electro Luminescence) display panel are preferably used as the display device 241.
Sound data is supplied from the processing device 21 to the speaker 242. The speaker 242 includes a DA converter. The sound data is converted into an analog signal by the DA converter, and the speaker 242 is driven by the analog signal.

The input device 25 is a device for the user U to input information for using the user device 20. The input device 25 receives an input operation by the user U. The input device 25 of this example includes a microphone 251 and a touch panel 252. The touch panel 252 detects a touch by the user U on the display surface of the display device 241. The touch panel 252 accepts an operation of inputting codes such as numbers and letters and an operation of selecting an icon displayed on the display device 241 based on the contact position. The microphone 251 converts the voice of the user U into an analog electric signal and outputs the electric signal as a sound signal Sa. The audio signal Sa is converted into a digital signal by an AD converter (not shown) and supplied to the processing device 21 via the bus 28.

The short-range wireless communication device 26 is a device that communicates with another device by short-range wireless communication. Examples of short-range wireless communication include Bluetooth (registered trademark), ZigBee (registered trademark), and WiFi (registered trademark). The television receiver 30 or the like corresponds to another device.
The GPS device 27 receives radio waves from a plurality of satellites and generates position information from the received radio waves. The position information indicates the position of the user device 20. The position information may have any format as long as the position can be specified. The position information indicates, for example, the latitude and longitude of the user device 20. In the present embodiment, the position information is illustrated as being obtained from the GPS device 27, but the user device 20 may acquire the position information by any other method. For example, the user device 20 may acquire the location information by using the cell ID assigned to the base station that is the communication destination of the user device 20. Alternatively, when the user device 20 communicates with an access point of a wireless LAN (Local Area Network) using the short-range wireless communication device 26, the user device 20 uses the identification address (MAC) on the network assigned to the access point. The position information may be acquired by referring to a database in which (Media Access Control) addresses) and actual addresses (positions) are associated with each other. Alternatively, the user device 20 receives the ID information included in the advertisement packet conforming to the BLE (Bluetooth Low Energy) standard by using the short-range wireless communication device 26, and acquires the position information based on the ID information. May be.

[1.3. Functions of User Device 20]
FIG. 4 is a functional block diagram showing the functions of the user device 20. The processing device 21 functions as the first keyword generation unit 210, the second keyword generation unit 220A, the identification unit 230A, and the comment generation unit 240 by reading and executing the control program PRa from the storage device 22.

The first keyword generation unit 210 generates the first keyword KW1 based on the voice of the user U indicated by the voice signal Sa.
Specifically, the first keyword generation unit 210 analyzes the voice of the user U and extracts a noun and an adjective from the analysis result. When the voice of the user U includes both a noun and an adjective, the first keyword generation unit 210 identifies the noun as the attention word. For example, when the voice of the user U is “a red car is suspicious”, the noun “car” is specified as the attention word. Further, when the voice of the user U does not include a noun but does include an adjective, the first keyword generation unit 210 identifies the adjective as the attention word. For example, when the voice of the user U is “it looks delicious”, the adjective “delicious” is specified as the attention word.

Also, the first keyword generation unit 210 determines whether the focused word is included in the keyword table TBLa. The first keyword generation unit 210 does not generate the first keyword KW1 when the determination result is negative. Therefore, the first keyword KW1 is limited to the keywords included in the keyword table TBLa. On the other hand, when the determination result is affirmative and the focused word is a noun, the first keyword generation unit 210 generates the focused word as the first keyword KW1. When the determination result is affirmative and the focused word is an adjective, the first keyword generation unit 210 refers to the keyword table TBLa and generates a noun word associated with the focused word as the first keyword KW1. To do. For example, when the attention word is “delicious”, the first keyword generation unit 210 generates “food and drink” as the first keyword KW1.

In this way, the first keyword generation unit 210 generates the first keyword KW1 related to the remark of the user U even when the remark of the user U is ambiguous.

Next, the second keyword generation unit 220A generates the second keyword KW2 for each of the object images extracted from the image indicated by the image signal Sg. The second keyword generation unit 220A has an extraction unit 221 and a conversion unit 222.

The extraction unit 221 extracts a plurality of object images from the image indicated by the image signal Sg. There are many object images in one screen image.

When the image indicated by the image signal Sg is the image shown in FIG. 5, the images extracted by the extraction unit 221 are, for example, the object images OB1 to OB5.

The conversion unit 222 converts each of the plurality of object images OB1 to OB5 extracted by the extraction unit 221 into the second keyword KW2. The conversion unit 222 converts each object image OB into the second keyword KW2 using, for example, an image recognition model learned by machine learning. However, the second keyword KW2 is included in the keywords stored in the keyword table TBLa. For example, the object image OB1 shown in FIG. 5 is converted into “wine”, the object image OB2 is converted into “wine glass”, the object image OB3 is converted into “clock”, the object image OB4 is converted into “candle”, and the object image OB5 is converted into “western food”. It

The identifying unit 230A selects the target keyword Wx from the plurality of second keywords KW2 generated by the second keyword generating unit 220A based on the degree of association indicating the degree of association between the second keyword KW2 and the first keyword KW1. Identify. More specifically, the identifying unit 230A refers to the keyword table TBLa to determine the degree of association between the second keyword KW2 and the first keyword KW1 that indicates the degree of association between the second keyword KW2 and the first keyword KW1. Acquire for each group. The identifying unit 230A identifies the second keyword KW2 included in the set of the second keyword KW2 and the first keyword KW1 having the highest degree of association as the target keyword Wx. As described below, the comment generating unit 240 generates a comment related to the target keyword Wx specified by the specifying unit 230A.

For example, suppose that the user U looks at the image shown in FIG. 5 and says “It looks delicious”. Further, it is assumed that “food and drink” is generated as the first keyword KW1 and “wine”, “wine glass”, “clock”, “candle”, and “western food” are generated as the second keyword KW2. .. In this case, the specifying unit 230A determines the degree of association between “food and drink” and “wine”, the degree of association between “food and drink” and “wine glass”, the degree of association between “food and drink” and “clock”, and “food and drink”. The degree of association between “thing” and “candle” and the degree of association between “food and drink” and “western food” are acquired by referring to the keyword table TBLa. The identifying unit 230A identifies the second keyword KW2 having the highest degree of association as the target keyword Wx by comparing the obtained plurality of degrees of association.

The comment generator 240 generates a comment related to the target keyword Wx. The comment means an explanation or an explanation of the target keyword Wx. A comment is a concept that includes recommendations. Therefore, the comment includes information about the product recommended to the user U and the store handling the product in relation to the target keyword Wx. The comment generation unit 240 generates a comment by reading from the comment table TBLb the comment stored in association with the target keyword Wx. Further, the comment generation unit 240 may access a search site connected to the network NW, acquire information related to the target keyword Wx from the search site, and generate the acquired information as a comment. For example, when the target keyword Wx is “ramen”, the comment generation unit 240 may search for a ramen restaurant near the position information generated by the GPS device 27 or the like and output the search result as a comment.

[1.4. Operation of User Device 20]
Next, the operation of the user device 20 will be described. FIG. 6 is a flowchart showing the operation of the user device 20.

First, the processing device 21 identifies the attention word based on the voice of the user U (step S1). The processing device 21 extracts the attention word by performing a voice recognition process for converting the voice of the user U into text and a specifying process for specifying a noun and an adjective from the converted text. The extraction word is a noun or an adjective if the noun is not specified.

Next, the processing device 21 determines whether or not the focused word is included in the keyword table TBLa (step S2). When the word of interest is not included in the keyword table TBLa, the processing device 21 returns the process to step S1 until the word of interest included in the keyword table TBLa is specified (that is, the determination result of step S2 is affirmative). Up to), the processes of steps S1 and S2 are repeated.

If the determination result in step S2 is affirmative, the processing device 21 determines whether or not the focused word is a noun (step S3). When the attention word is a noun, the processing device 21 generates the attention word as the first keyword KW1. In the process of step S1, the processing device 21 extracts a noun or an adjective as the attention word. Therefore, when the determination result of step S3 is negative, the attention word is the adjective. In this case, the processing device 21 refers to the keyword table TBLa and generates the word of the noun associated with the focused word as the first keyword KW1 (step S5).

Next, the processing device 21 extracts an object image from the image indicated by the image signal Sg (step S6). A plurality of object images usually exist in one frame image. Therefore, the processing device 21 extracts a plurality of object images in the process of step S6. After that, the processing device 21 converts each of the plurality of extracted object images into the second keyword KW2 (step S7).

Next, the processing device 21 identifies the target keyword Wx from the plurality of second keywords KW2 generated in step S7 based on the degree of association between the second keyword KW2 and the first keyword KW1.

Next, the processing device 21 generates a comment related to the target keyword Wx (step S9). In the process of step S9, the processing device 21 generates a comment by reading the comment stored in association with the target keyword Wx from the comment table TBLb. The processing device 21 outputs the generated comment by one of the following methods. (a) The processing device 21 causes the display device 241 to display the moving image represented by the moving image data in which the generated comment image is superimposed. (b) The processing device 21 uses the short-range wireless communication device 26 to transmit the moving image data in which the generated comment image is superimposed to the television receiver 30. (c) The processing device 21 converts the generated comment into sound data, combines the sound data representing the comment with the sound data of the moving image, and causes the speaker 242 to emit the combined result. (d) The processing device 21 uses the short-range wireless communication device 26 to transmit the synthesis result of the sound data representing the comment and the sound data of the moving image to the television receiver 30. The methods (a) to (d) may be combined arbitrarily.

Further, the processing device 21 functions as the first keyword generation unit 210 in the processing of steps S1 to S5, functions as the extraction unit 221 in the processing of step S6, and functions as the conversion unit 222 in the processing of step S7. Further, the processing device 21 functions as the identifying unit 230A in the process of step S8, and functions as the comment generating unit 240 in the process of step S9.

As described above, the information processing apparatus, which is an example of the user apparatus 20, includes the first keyword generation unit 210 that generates the first keyword KW1 based on the voice of the user U, and the plurality of extracted from the image indicated by the image signal Sg. Generated by the second keyword generation unit 220A based on the degree of association between the second keyword KW2 and the first keyword KW1 and the second keyword generation unit 220A that generates the second keyword KW2 for each of the object images. Also, a specifying unit 230A that specifies a target keyword Wx to be a comment target from the plurality of second keywords KW2 and a comment generating unit 240 that generates a comment related to the target keyword Wx are provided.

According to this aspect, the comment can be generated in response to the vague statement of the user U without the user U designating the object image to be the comment target.

Further, as compared with the degree of association between the second keyword KW2 and the first keyword KW1 when the one second keyword KW2 does not match the first keyword KW1, another second keyword KW2 matches the first keyword KW1. The degree of relevance when doing is high. Therefore, when any of the plurality of second keywords KW2 matches the first keyword KW1, the specifying unit 230A specifies the second keyword KW2 that matches the first keyword KW1 as the target keyword Wx. In this case, the identifying unit 230A determines whether each of the plurality of second keywords KW2 matches the first keyword KW1, and when the determination result regarding any of the plurality of second keywords KW2 is affirmative, the first keyword The second keyword KW2 that matches KW1 can be specified as the target keyword Wx. Therefore, it is not necessary to refer to the keyword table TBLa to acquire the degree of association, and the processing load can be reduced.

[2. Second Embodiment]
The service system 1 of the second embodiment is the same as the service system 1 of the first embodiment, except for the function of the processing device 21 in the user device 20. FIG. 7 is a functional block diagram showing the functions of the processing device 21 of the second embodiment. The processing device 21 of the second embodiment differs from the processing device 21 of the first embodiment in that a second keyword generation unit 220B is provided instead of the second keyword generation unit 220A.

As shown in FIG. 7, the second keyword generation unit 220B includes an extraction unit 221, a conversion unit 222, and an analysis unit 223. The image signal Sg is supplied to the analysis unit 223. The analysis unit 223 analyzes the image signal Sg of the moving image and outputs the analysis result to the extraction unit 221.

The analysis unit 223, for example, evaluates each of the plurality of object images included in an arbitrary frame of the image signal Sg using the first to fourth evaluation items, and the total of the evaluation values as the analysis result. It is output to the extraction unit 221. In this case, the extraction unit 221 extracts an object image whose total evaluation value exceeds a predetermined value.

The first evaluation item is the ratio of the area of the object image to the area of one screen, and the larger the ratio of the object image, the higher the evaluation value of the object image. The second evaluation item is the perspective of the object image viewed from the user U, and the evaluation value of the object image is higher as the object image is closer to the front. The third evaluation item is the brightness of the object image, and the brighter the brightness of the object image, the higher the evaluation value of the object image. The fourth evaluation item is the position of the object image, and the closer the position of the object image is to the center of the screen, the higher the evaluation value of the object image. Each of the first to fourth evaluation items is an element that attracts the interest of the user U in the image of one screen. By evaluating the object image using a plurality of evaluation items, the object image in which the user U is highly interested can be extracted.

Next, the operation of the user device 20 in the second embodiment will be described. FIG. 8 is a flowchart showing the operation of the user device 20 according to the second embodiment. The flowchart shown in the figure is the same as the flowchart of the first embodiment shown in FIG. 6 except that steps S6_1 and S6_2 are executed instead of step S6. The differences will be described below.

In step S6_1, the processing device 21 functions as the analysis unit 223, acquires a plurality of evaluation values obtained by evaluating each of a plurality of object images included in a frame for each evaluation item, and calculates the sum of these evaluation values. For example, the analysis results of the object images OB1 to OB5 shown in FIG. 5 are as shown in FIG. In this example, the sum of the evaluation values for each of the object images OB1 to OB5 is in the range of “11” to “16”.

In step S6_2, the processing device 21 functions as the extraction unit 221 and extracts an object image whose total evaluation value exceeds a predetermined value. For example, it is assumed that the predetermined value is “13” and that the sum of the evaluation values shown in FIG. 9 is obtained for each object image. In this case, the processing device 21 extracts the object images OB2 and OB5. Note that the processing from step S7 onward is the same as the processing described in the first embodiment with reference to FIG. 6, so description will be omitted.

As described above, according to the second embodiment, the second keyword generating unit 220B analyzes the image signal Sg from the image indicated by the image signal Sg based on the analysis unit 223 that analyzes the image signal Sg and the analysis result of the analyzing unit 223. An extraction unit 221 that extracts a plurality of object images, and a conversion unit 222 that converts each of the plurality of object images into a second keyword KW2 are provided.

According to this aspect, since a plurality of object images are extracted based on the analysis result of the image signal Sg, the number of object images to be extracted can be reduced as compared with the case where the object images are extracted without using the analysis result. You can Therefore, the processing load of the conversion unit 222 can be reduced.

The analysis unit 223 may analyze the image signal Sg over a plurality of frames and generate an analysis result. In this case, the analysis unit 223 may adopt the fifth evaluation item regarding the movement of the object image in addition to the first evaluation item to the fourth evaluation item. An example of the fifth evaluation item is the number of frames corresponding to the length of time that a moving object image exists in the screen. According to this evaluation item, the larger the number of frames is (the object image exists in the screen The evaluation value of the object image is higher (the longer the time is spent). For example, when the hero of a movie moves, the movie is often shot so as to follow the movement of the hero. Therefore, when the moving image represented by the image signal Sg is a movie, the evaluation value of the object image representing the protagonist and the evaluation value of the object image representing the belongings possessed by the protagonist are increased. As a result, it is possible to increase the possibility that the extraction unit 221 extracts the object image focused on by the user U. On the contrary, it is possible to reduce the possibility that the extraction unit 221 extracts the object image that the user U does not pay attention to.

[3. Third Embodiment]
The service system 1 of the third embodiment is the same as the service system 1 of the first embodiment, except for the function of the processing device 21 in the user device 20 and the stored contents of the storage device 22. FIG. 10 is a functional block diagram showing the functions of the processing device 21 of the third embodiment. The processing device 21 of the third embodiment is different from the processing device 21 of the first embodiment in that it includes a specifying unit 230B instead of the specifying unit 230A.

The storage device 22 of the user device 20 of the third embodiment stores the action history table TBLc. The behavior history table TBLc stores the behavior history of the user U. The action history includes the Internet search history of the user U, the purchase history of products and services, the activity of SNS (Social Networking Service), the bookmark of the Web browser, and the like.

The identifying unit 230B generates a plurality of second keywords generated by the second keyword generating unit 220A, based on the degree of association between each second keyword KW2 and the first keyword KW1 and the behavior history of the user U. The target keyword Wx is specified from KW2.

First, the identifying unit 230B selects, from the plurality of second keywords KW2 generated by the second keyword generating unit 220A, the second keyword KW2 having a degree of association with the first keyword KW1 that is equal to or greater than a predetermined value. The selected second keyword KW2 becomes a candidate for the target keyword Wx. Next, the identifying unit 230B identifies the target keyword Wx from the selected second keywords KW2 with reference to the action history stored in the action history table TBLc. For example, assume that the second keyword KW2 selected based on the degree of association is “wine” and “western food”. Further, it is assumed that wine purchase history is recorded in the action history table TBLc. In this case, when the specifying unit 230B detects that the user U has a purchase history of wine by referring to the action history table TBLc, it specifies “wine” as the target keyword Wx of “wine” and “Western food”. To do. As a result, the comment generator 240 can generate a comment regarding “wine”.

Next, the operation of the user device 20 in the third embodiment will be described. FIG. 11 is a flowchart showing the operation of the user device 20 according to the third embodiment. The flowchart shown in the figure is the same as the flowchart of the first embodiment shown in FIG. 6 except that steps S8_1 and S8_2 are executed instead of step S8. The differences will be described below.

In step S8_1, the processing device 21 functions as the identifying unit 230B, and selects the second keyword KW2 having a degree of association with the first keyword KW1 that is equal to or greater than a predetermined value from the plurality of second keywords KW2 generated in step S7. To do.

In step S8_2, the processing device 21 functions as the identifying unit 230B, and the second keyword KW2 related to the action history among the second keywords KW2 selected in the process of step S8_1 is set as the target keyword Wx based on the action history. Identify.

According to the third embodiment, the identifying unit 230B identifies the target keyword Wx from the plurality of second keywords KW2 based on the degree of association indicating the degree of association and the behavior history of the user U. According to this aspect, since the target keyword Wx is specified in consideration of the behavior history of the user U, it is possible to provide a comment of high interest to the user U, as compared with the case where the behavior history of the user U is not considered. ..

Note that in the operation of the user device 20 described with reference to FIG. 11, the identifying unit 230B is a candidate for the target keyword Wx among the plurality of second keywords KW2 generated by the second keyword generating unit 220A using the degree of association. Then, the second keyword KW2 is selected (step S8_1), and then the target keyword Wx is specified based on the action history (step S8_2), but the order may be reversed. That is, the identification unit 230B selects the second keyword KW2 that is a candidate for the target keyword Wx from the plurality of second keywords KW2 generated by the second keyword generation unit 220A based on the action history, and then uses the degree of association. Alternatively, the target keyword Wx may be specified. In addition, the identifying unit 230B may identify the target keyword Wx from the plurality of second keywords KW2 by using the action history and the degree of association at the same time. The specifying unit 230B adds, for example, a predetermined value to the degree of association for the second keyword KW2 related to the action history, and compares the degree of association to which the predetermined value is added between the plurality of second keywords KW2 to determine the target keyword Wx. May be specified.

[4. Fourth Embodiment]
The service system 1 of the fourth embodiment is the same as the service system 1 of the first embodiment, except for the function of the processing device 21 in the user device 20 and the stored contents of the storage device 22. FIG. 12 is a functional block diagram showing the functions of the processing device 21 of the fourth embodiment. The processing device 21 of the fourth embodiment is different from the processing device 21 of the first embodiment in that a specifying unit 230C is provided instead of the specifying unit 230A.

The storage device 22 of the user device 20 of the fourth embodiment stores the profile data DP and the evaluation table TBLd. The profile data DP indicates the profile of the user U. The profile means the attribute of the user U, and includes items such as age and sex.

In the evaluation table TBLd, the evaluation value for each profile item is stored in association with the keyword. The evaluation value is a value indicating the degree of interest of the user U with respect to the keyword. FIG. 13 shows an example of the stored contents of the evaluation table TBLd. For example, for the keyword “car”, the evaluation value for gender “male” is “7”, whereas the evaluation value for gender “female” is “4”. This shows that men are more interested in cars than women.

The identifying unit 230C uses the plurality of second keywords KW2 generated by the second keyword generating unit 220A based on the degree of association between each second keyword KW2 and the first keyword KW1 and the profile of the user U. The target keyword Wx is specified from among these.

First, the identifying unit 230C selects the second keyword KW2 having a degree of association with the first keyword KW1 that is equal to or more than a predetermined value, from among the plurality of second keywords KW2 generated by the second keyword generating unit 220A. Next, the specifying unit 230C specifies the target keyword Wx from the selected second keywords KW2 using the profile data DP and the evaluation table TBLd. Specifically, for each of the selected second keywords KW2, a total evaluation value obtained by summing evaluation values corresponding to a plurality of items of the profile of the user U is calculated, and the second keyword KW2 having the highest total evaluation value is calculated. Is specified as the target keyword Wx.

Next, the operation of the user device 20 in the fourth embodiment will be described. FIG. 14 is a flowchart showing the operation of the user device 20 according to the fourth embodiment. The flowchart shown in the figure is the same as the flowchart of the third embodiment shown in FIG. 12 except that step S8_3 is executed instead of step S8_2. The differences will be described below.

In step S8_3, the processing device 21 functions as the specifying unit 230C, and the second keyword KW2 having the highest total evaluation value of the profile of the user U among the second keywords KW2 selected in the process of step S8_1 based on the profile. Is specified as the target keyword Wx.

According to the fourth embodiment, the identifying unit 230C identifies the target keyword Wx from the plurality of second keywords KW2 based on the degree of association indicating the degree of association and the profile of the user U. According to this aspect, since the target keyword Wx is specified in consideration of the profile of the user U, it is possible to provide a comment of high interest to the user U, as compared with the case where the profile of the user U is not considered.

In the operation of the user device 20 described with reference to FIG. 14, the identifying unit 230B determines that the target keyword Wx is a candidate of the target keyword Wx among the plurality of second keywords KW2 generated by the second keyword generating unit 220A using the degree of association. The second keyword KW2 is selected (step S8_1), and then the target keyword Wx is specified based on the profile (step S8_3), but the order may be reversed. That is, the specifying unit 230C selects the second keyword KW2 that is a candidate for the target keyword Wx from the plurality of second keywords KW2 generated by the second keyword generating unit 220A based on the profile, and then uses the degree of association. The target keyword Wx may be specified. In addition, the specifying unit 230C may specify the target keyword Wx from the plurality of second keywords KW2 by using the profile and the degree of association at the same time. The specifying unit 230C may add the total evaluation value based on the profile to the degree of association, and compare the addition results of the plurality of second keywords KW2 to specify the target keyword Wx.

[5. Modification]
The present invention is not limited to the embodiments illustrated above. Specific modes of modification will be exemplified below. Two or more aspects arbitrarily selected from the following examples may be merged.

(1) In each of the above-described embodiments, the frame in which the extraction unit 221 extracts the object image from the image of the image signal Sg may be the following frame.
First, the extraction unit 221 may extract the object image in a frame having a high audience rating. In this case, the extraction unit 221 may acquire the audience rating from the external device in real time. Specifically, the extraction unit 221 extracts the object image in a frame in which the acquired audience rating exceeds a predetermined audience rating. It is estimated that the user U has a higher interest in a frame having a higher audience rating than other frames. Therefore, since the object image is extracted from the image of the frame in which the user U has a high interest, a comment useful for the user U can be generated.
Secondly, the extraction unit 221 may extract the object image in a frame in which the user U cheers based on the audio signal Sa of the user U.
Thirdly, the extraction unit 221 may extract the object image in a frame that is the subject of the program based on the program information. For example, the extraction unit 221 may use the analysis unit 223 described in the second embodiment to analyze the image signal Sg and identify the frame that is the subject of the program. In this case, the analysis unit 223 may acquire the program information from the external device via the network NW.
Further, in each of the above-described embodiments, the image signal Sg has been described as a signal indicating a moving image, but the image signal Sg may be a signal indicating a still image.

(2) In each of the above-described embodiments, the degree of association indicating the degree of association between the second keyword KW2 and the first keyword KW1 is stored in the keyword table TBLa, but the present invention is not limited to this.
For example, the identifying

units

230A, 230B, and 230C obtain the degree of association according to the number of nodes identified from the keyword table TBLa (an example of keyword data) having a tree structure in which a plurality of words are layered according to meaning. Good. Specifically, the first keyword generation unit 210 generates a word included in the keyword table TBLa as the first keyword KW1. Also, the second

keyword generation units

220A and 220B generate words included in the keyword table TBLa as the second keyword KW2. The specifying

units

230A, 230B, and 230C acquire the number of nodes in the route from the first keyword KW1 to the second keyword KW2 in the tree structure of the keyword table TBLa as the degree of association.
More specifically, it is assumed that the data structure of the keyword table TBLa is the tree structure shown in FIG. For example, when the first keyword KW1 is “liquor” and the second keyword KW2 is “french potato”, the route from “liquor” to “french potato” is node “liquor” → node “drink” → node “Food and drink” → node “food” → node “Western food” → node “fried potato”. Therefore, the number of nodes on the route from the first keyword KW1 "liquor" to the second keyword KW2 "fried potatoes" is "5". Further, when the first keyword KW1 is “food and drink” and the second keyword KW2 is “fries”, the route from “food and drink” to “fries” is node “food and drink” → node “food” -> Node "Western food"-> node "fries". Therefore, the number of nodes on the route from the first keyword KW1 "food and drink" to the second keyword KW2 "fried potatoes" is "3". The smaller the number of nodes on the route connecting the first keyword KW1 and the second keyword KW2, the higher the degree of association. The degree of association is higher than the degree of association between the first keyword KW1 “sake” and the second keyword KW2 “fried potato”.
By specifying the degree of association according to the number of nodes, it is possible to reduce the storage capacity of the keyword table TBLa required in the user device 20.

(3) In each of the above-described embodiments, the extraction unit 221 extracts the object image without considering the action history of the user U, but extracts the object image from the image indicated by the image signal Sg based on the action history. Good. In this case, the extraction unit 221 refers to the action history table TBLc described in the third embodiment, identifies a favorite color of the user U, for example, from the purchase history of the product, and extracts the object image of the identified color. May be. According to this modification, the object images can be narrowed down, so that the processing load of the conversion unit 222 can be reduced.

(4) The above-described embodiments can be combined appropriately. For example, the second keyword generation unit 220B of the second embodiment may be used instead of the second keyword generation unit 220A of the third and fourth embodiments.

(5) The block diagram used in the description of each of the above-described embodiments shows blocks of functional units. These functional blocks (components) are realized by an arbitrary combination of hardware and / or software. Further, the means for realizing each functional block is not particularly limited. That is, each functional block may be realized by one device that is physically and / or logically coupled, or may be directly and / or indirectly connected to two or more devices that are physically and / or logically separated. (For example, wired and / or wireless), and may be realized by a plurality of these devices. For example, the function of the conversion unit 222 may be provided from a server device connected via the network NW. Similarly, the keyword table TBLa may be provided in the server device.
Further, the wording “apparatus” used in the description of each of the above-described embodiments may be replaced with another term such as a circuit, a device, or a unit.

(6) The order of the processing procedures, sequences, flowcharts, etc. in each of the above-described embodiments may be changed as long as there is no contradiction. For example, the methods described herein present elements of the various steps in a sample order, and are not limited to the specific order presented.

(7) In each of the above-described embodiments, the input / output information and the like may be stored in a specific place (for example, a memory) or may be managed by a management table. Information that is input / output may be overwritten, updated, or added. The output information and the like may be deleted. The input information and the like may be transmitted to another device.

(8) In each of the above-described embodiments, the determination may be performed by a value (0 or 1) represented by 1 bit, or by a true / false value (Boolean: true or false). However, the comparison may be performed by comparing numerical values (for example, comparison with a predetermined value).

(9) In each of the above-described embodiments, the storage device 22 is a recording medium that can be read by the processing device 21, and is exemplified by a ROM and a RAM, but a flexible disk, a magneto-optical disk (for example, a compact disk, a digital multi-disk). Purpose disk, Blu-ray (registered trademark) disk), smart card, flash memory device (for example, card, stick, key drive), CD-ROM (Compact Disc-ROM), register, removable disk, hard disk, floppy (registered Trademark disks, magnetic strips, databases, servers and other suitable storage media. Further, the program may be transmitted from the network NW. Further, the program may be transmitted from a communication network via an electric communication line.

(10) Each of the above-described embodiments includes LTE (Long Term Evolution), LTE-A (LTE-Advanced), SUPER 3G, IMT-Advanced, 4G, 5G, FRA (Future Radio Access), W-CDMA (registered trademark) ), GSM (registered trademark), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi), IEEE 802.11 (WiMAX), IEEE 802.20, UWB (Ultra-WideBand), Bluetooth (registered). (Trademark), other systems utilizing appropriate systems, and / or next-generation systems extended based on these.

(11) In each of the embodiments described above, the information, signals, etc. described may be represented using any of a variety of different technologies. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc., that may be mentioned throughout the above description are voltage, current, electromagnetic waves, magnetic fields or magnetic particles, optical fields or photons, or any of these. May be represented by a combination of
Note that the terms described in this specification and / or the terms necessary for understanding this specification may be replaced with terms having the same or similar meanings.

(12) Each function illustrated in FIGS. 4, 7, 10, and 12 is realized by an arbitrary combination of hardware and software. Further, each function may be realized by a single device, or may be realized by two or more devices that are configured separately from each other.

(13) The program illustrated in each of the above-described embodiments is an instruction, an instruction set, a code, or a code segment regardless of whether it is called software, firmware, middleware, microcode, a hardware description language, or another name. , Program code, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executable files, threads of execution, procedures or functions, etc., should be construed broadly.
Further, software, instructions, etc. may be transmitted and received via a transmission medium. For example, the software may use a wired technology such as coaxial cable, fiber optic cable, twisted pair and digital subscriber line (DSL) and / or wireless technology such as infrared, wireless and microwave to websites, servers, or other When transmitted from a remote source, these wireline and / or wireless technologies are included within the definition of transmission medium.

(14) In the above-described embodiments, the terms "system" and "network" are used interchangeably.

(15) In each of the above-described embodiments, information, parameters, and the like may be represented by an absolute value, a relative value from a predetermined value, or another corresponding information. Good.

(16) In each of the above-described embodiments, the case where the user device 20 is a mobile station is included. A mobile station can be a subscriber station, mobile unit, subscriber unit, wireless unit, remote unit, mobile device, wireless device, wireless communication device, remote device, mobile subscriber station, access terminal, mobile terminal, wireless, by a person skilled in the art. It may also be referred to as a terminal, remote terminal, handset, user agent, mobile client, client, or some other suitable term.

(17) In each of the embodiments described above, the term "connected" or any variation thereof means any direct or indirect connection or coupling between two or more elements, It can include the presence of one or more intermediate elements between two elements that are “connected” to each other. The connections between the elements may be physical, logical, or a combination thereof. As used herein, two elements are radio frequency by using one or more wires, cables and / or printed electrical connections, and as some non-limiting and non-exhaustive examples. By using electromagnetic energy, such as electromagnetic energy having wavelengths in the region, the microwave region and the light (both visible and invisible) region, it can be considered to be “connected” to each other.

(18) In each of the above-described embodiments, the description “based on” does not mean “based only on” unless otherwise specified. In other words, the phrase "based on" means both "based only on" and "based at least on."

(19) As used herein, any reference to elements using the designations "first", "second", etc. does not generally limit the amount or order of those elements. These designations may be used herein as a convenient way to distinguish between two or more elements. Thus, references to the first and second elements do not imply that only two elements may be employed therein, or that the first element must precede the second element in any way.

(20) As long as the terms "including", "comprising", and variations thereof in each of the above-described embodiments are used in the present specification or claims, these terms are: Like the term “comprising” is intended to be inclusive. Furthermore, the term "or" as used in the specification or claims is not intended to be an exclusive OR.

(21) Throughout this application, where translations add articles, such as a, an, and the in English, unless these articles clearly indicate otherwise, Including multiple.

(22) It is obvious to those skilled in the art that the present invention is not limited to the embodiments described herein. The present invention can be implemented as modified and changed modes without departing from the spirit and scope of the present invention defined based on the description of the claims. Therefore, the description of the present specification is for the purpose of exemplification, and has no restrictive meaning to the present invention. In addition, a plurality of modes selected from the modes exemplified in this specification may be combined.

DESCRIPTION OF SYMBOLS 1 ... Service system, 10 ... Video distribution server, 11 ... Processing device, 20 ... User device, 21 ... Processing device, 22 ... Storage device, 210 ... 1st keyword generation part, 220A, 220B ... 2nd keyword generation part, 220B ... second keyword generating unit, 221, ... extracting unit, 222 ... converting unit, 223 ... analyzing unit, 230A, 230B, 230C ... specifying unit, 240 ... comment generating unit, KW1 ... first keyword, KW2 ... second keyword, TBLa ... keyword table, TBLb ... comment table, TBLc ... action history table, Wx ... target keyword.

Claims

A first keyword generation unit that generates a first keyword based on a user's voice;
A second keyword generation unit that generates a plurality of second keywords corresponding to the plurality of object images extracted from the image indicated by the image signal on a one-to-one basis;
A specifying unit that specifies a target keyword to be a comment target from the plurality of second keywords based on the degree of association between each of the plurality of second keywords and the first keyword;
A comment generation unit that generates a comment related to the target keyword,
An information processing apparatus including.
Of the plurality of second keywords, the degree of association when one second keyword does not match the first keyword, and the degree of association when another second keyword matches the first keyword The degree of sex is high,
The information processing device according to claim 1, wherein the specifying unit specifies, as the target keyword, a second keyword that matches the first keyword among the plurality of second keywords.
The second keyword generation unit,
An analysis unit for analyzing the image signal,
An extraction unit that extracts the plurality of object images from the image indicated by the image signal based on the analysis result of the analysis unit;
A conversion unit that converts each of the plurality of object images into a corresponding second keyword of the plurality of second keywords.
The information processing apparatus according to claim 1.
2. The identifying unit identifies the target keyword from the plurality of second keywords based on the degree of association between each second keyword and the first keyword and the action history of the user. The information processing apparatus according to any one of 3 to 3.
4. The identifying unit identifies the target keyword from the plurality of second keywords based on the degree of association between each second keyword and the first keyword and the profile of the user. The information processing apparatus according to any one of items 1 to 7.