CN109040297B

CN109040297B - User portrait generation method and device

Info

Publication number: CN109040297B
Application number: CN201811005336.1A
Authority: CN
Inventors: 黄桂志
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2018-08-30
Filing date: 2018-08-30
Publication date: 2021-04-06
Anticipated expiration: 2038-08-30
Also published as: CN109040297A

Abstract

The invention discloses a user portrait generation method and device, and belongs to the technical field of networks. The method comprises the following steps: analyzing a video file of a main broadcasting user to acquire fingerprint information of the main broadcasting user, wherein the video file is generated based on video data and audio data collected when the main broadcasting user carries out song singing and live broadcasting, and the fingerprint information is used for describing the main broadcasting user; acquiring a description label selected by a viewer user for the anchor user according to the video file and preference information of the anchor user and the video file, wherein the preference information comprises likes and dislikes; updating the fingerprint information of the anchor user according to the description label and the preference information; and generating a user portrait of the anchor user according to the updated fingerprint information. The information acquired by the method is richer and more comprehensive, so that the accuracy and the reliability of the generated user portrait are higher.

Description

User portrait generation method and device

Technical Field

The invention relates to the technical field of networks, in particular to a user portrait generation method and device.

Background

The user portrait is also called as a user role and is an effective tool for delineating target users and connecting user appeal and design direction, and the user portrait is widely applied to various fields, such as obtaining the user portrait of a main user in network live broadcast.

At present, for any anchor user, basic attributes of the anchor user, such as gender, age, and academic calendar, are generally obtained, and some simple behavior information of the anchor user, such as the live broadcast duration of the anchor user, can also be obtained. Further, a user representation of the anchor user is generated based on the acquired information.

The above-mentioned technology generates the user portrait of the anchor user only from the basic information of the anchor user, and the accuracy and reliability of the user portrait are poor due to insufficient acquired information.

Disclosure of Invention

The embodiment of the invention provides a user portrait generation method and device, which can solve the problem of poor accuracy and reliability of user portraits in the related technology. The technical scheme is as follows:

in a first aspect, a user representation generation method is provided, including:

analyzing a video file of a main broadcasting user to acquire fingerprint information of the main broadcasting user, wherein the video file is generated based on video data and audio data collected when the main broadcasting user carries out song singing and live broadcasting, and the fingerprint information is used for describing the main broadcasting user;

acquiring a description label selected by a viewer user for the anchor user according to the video file and preference information of the anchor user and the video file, wherein the preference information comprises likes and dislikes;

updating the fingerprint information of the anchor user according to the description label and the preference information;

and generating a user portrait of the anchor user according to the updated fingerprint information.

In one possible implementation manner, the analyzing a video file of a anchor user to obtain fingerprint information of the anchor user includes:

and performing sound analysis on the video file, analyzing song information corresponding to the video file, and acquiring an audio fingerprint of the anchor user, wherein the song information comprises at least one of a song name and a song category, and the audio fingerprint is used for describing a song singing behavior of the anchor user.

In one possible implementation manner, the performing sound analysis on the video file, performing analysis on song information corresponding to the video file, and acquiring an audio fingerprint of the anchor user includes:

performing sound analysis on the video file, and determining the similarity of the anchor user and the song corresponding to the video file sung by the original singer;

analyzing song information corresponding to the video file, and counting the number of each song and each kind of songs sung by the anchor user;

and determining at least one of the song name and the song category which are strong in singing by the anchor user according to the similarity and the quantity.

performing sound analysis on the video file to obtain a singing score of the anchor user singing the song corresponding to the video file;

and determining at least one of the song name and the song category which the anchor user is good at singing according to the singing score and the amount.

and performing character analysis and scene analysis on the video file to acquire the video fingerprint of the anchor user, wherein the video fingerprint is used for describing the appearance of the anchor user and the quality of the live video.

In one possible implementation manner, the performing character analysis and scene analysis on the video file to obtain a video fingerprint of the anchor user includes:

and carrying out face recognition, clothing and scene analysis on the video file, and determining the video picture resolution, the gender and the face value score of the anchor user.

In one possible implementation, after generating the user representation of the anchor user according to the updated fingerprint information, the method further includes:

and recommending the audience users in the live broadcast room according to the user figures of the anchor users.

In one possible implementation, the live room recommendation for audience users according to the user profile of each anchor user includes:

when any audience user is detected to log in a server, recommending a live broadcast room of the anchor user matched with the first historical behavior to the audience user according to the first historical behavior of the audience user and the user representation of each anchor user, wherein the first historical behavior comprises a description label selected by the audience user for the anchor user and preference information of the anchor user.

when any audience user is detected to play the song, recommending a live broadcast room of the anchor user matched with the second historical behaviors to the audience user according to the second historical behaviors of the audience user and the user figures of all the anchor users, wherein the second historical behaviors comprise the historical behaviors of the audience user in playing operation, favorite operation and collection operation of the song.

In a second aspect, a user representation generation method is provided, including:

displaying a anchor selection interface for providing a plurality of anchor users for selection by an audience user;

when the selection operation of any anchor user is detected, displaying the video file of the anchor user;

when the playing operation of the video file is detected, playing the video file;

during or after the video file is played, acquiring a description label selected by a viewer user for the anchor user and preference information of the viewer user for the anchor user and the video file, wherein the preference information comprises likes and dislikes;

and sending the description label and the preference information to a server, wherein the description label and the preference information are used for the server to generate a user portrait of the anchor user.

In one possible implementation, the obtaining of the description tag selected by the viewer user for the anchor user and the preference information of the viewer user for the anchor user and the video file includes:

displaying a plurality of preset description labels, a first option and a second option, wherein the first option and the second option correspond to different preference information;

when a selection operation of at least one description label in the plurality of description labels is detected, the at least one description label is used as a description label selected by an audience user for the anchor user;

and when the selection operation of any one of the first option and the second option is detected, taking preference information corresponding to the option as preference information of the audience user to the anchor user.

In one possible implementation, the method further includes:

and simultaneously displaying at least one of the song name and the song category which are strong in singing by the anchor user when the video file of the anchor user is displayed.

In a third aspect, a user representation generation apparatus is provided, comprising:

the acquisition module is used for analyzing a video file of a main broadcasting user and acquiring fingerprint information of the main broadcasting user, wherein the video file is generated based on video data and audio data collected when the main broadcasting user carries out song singing and live broadcasting, and the fingerprint information is used for describing the main broadcasting user;

the acquisition module is further used for acquiring a description label selected by a viewer user for the anchor user according to the video file and preference information of the anchor user and the video file, wherein the preference information comprises likes and dislikes;

the updating module is used for updating the fingerprint information of the anchor user according to the description label and the preference information;

and the generating module is used for generating the user portrait of the anchor user according to the updated fingerprint information.

In a possible implementation manner, the obtaining module is configured to perform sound analysis on the video file, analyze song information corresponding to the video file, and obtain an audio fingerprint of the anchor user, where the song information includes at least one of a song name and a song category, and the audio fingerprint is used to describe a song singing behavior of the anchor user.

In one possible implementation, the obtaining module is configured to:

In a possible implementation manner, the obtaining module is configured to perform character analysis and scene analysis on the video file, and obtain a video fingerprint of the anchor user, where the video fingerprint is used to describe an appearance of the anchor user and a quality of a live video.

In one possible implementation, the acquisition module is configured to perform face recognition, clothing and scene analysis on the video file, and determine a video frame resolution, a gender and a face score of the anchor user.

In one possible implementation, the apparatus further includes:

and the recommending module is used for recommending the live broadcast room to audience users according to the user figures of all the anchor users.

In a possible implementation manner, the recommending module is configured to recommend, to the viewer user, a live broadcast room of the anchor user matching a first historical behavior according to the first historical behavior of the viewer user and the user profile of each anchor user when detecting that any viewer user logs in the server, where the first historical behavior includes a description tag selected by the viewer user for the anchor user and preference information of the anchor user.

In a possible implementation manner, the recommending module is configured to recommend, to the viewer user, a live broadcast room of the anchor user matching a second historical behavior according to the second historical behavior of the viewer user and the user profile of each anchor user when it is detected that any viewer user plays a song, where the second historical behavior includes historical behaviors of the viewer user in performing a play operation, a favorite operation, and a favorite operation on the song.

In a fourth aspect, a user representation generation apparatus is provided, comprising:

the display module is used for displaying a main broadcasting selection interface, and the main broadcasting selection interface is used for providing a plurality of main broadcasting users for audience users to select;

the display module is further used for displaying the video file of any anchor user when the selection operation of the anchor user is detected;

the playing module is used for playing the video file when the playing operation of the video file is detected;

an obtaining module, configured to obtain, during or after playing of the video file, a description tag selected by a viewer user for the anchor user and preference information of the viewer user for the anchor user and the video file, where the preference information includes likes and dislikes;

and the sending module is used for sending the description label and the preference information to a server, and the description label and the preference information are used for the server to generate a user portrait of the anchor user.

In one possible implementation, the obtaining module is configured to:

In one possible implementation, the display module is further configured to display at least one of a song title and a song category that the anchor user is adept at singing simultaneously when displaying the video file of the anchor user.

In a fifth aspect, there is provided a user representation generation system, the system comprising a first terminal, a server and a second terminal,

the first terminal is used for generating a video file of a main broadcasting user based on collected video data and audio data in a time period when the main broadcasting user carries out song singing live broadcasting, and sending the video file of the main broadcasting user to a server;

the server is used for analyzing the video file of the anchor user and acquiring the fingerprint information of the anchor user;

the second terminal is used for playing the video file corresponding to the playing operation when the playing operation of any video file of the anchor user is detected, acquiring a description label selected by a viewer user for the anchor user and preference information of the viewer user for the anchor user, and sending the description label and the preference information to the server, wherein the preference information comprises likes and dislikes;

the server is further used for updating the fingerprint information of the anchor user according to the description label and the preference information, and generating a user portrait of the anchor user according to the updated fingerprint information.

In a sixth aspect, a server is provided that includes a processor and a memory; the memory is used for storing a computer program; the processor is configured to execute the computer program stored in the memory to implement the method steps of any one of the implementation manners of the first aspect.

In a seventh aspect, a terminal is provided that includes a processor and a memory; the memory is used for storing a computer program; the processor is configured to execute the computer program stored in the memory to implement the method steps of any implementation manner of the second aspect.

In an eighth aspect, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, carries out the method steps of any one of the implementations of the aspect.

The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:

by analyzing the video file of the anchor user, after acquiring the fingerprint information for describing the anchor user, further acquiring the description label selected by the audience user for the anchor user, and the favorite information of the anchor user and the video file, and after acquiring the updated fingerprint information, generating the user portrait of the anchor user.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a user representation generation system according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for generating a user representation according to an embodiment of the present invention;

FIG. 3 is a flow chart of a method for generating a user representation according to an embodiment of the present invention;

FIG. 4 is a flow chart of a method for generating a user representation according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a user interface provided by an embodiment of the present invention;

fig. 6 is a schematic diagram of a recommended live broadcast room according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a recommended live broadcast room according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a user representation generating apparatus according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a user representation generating apparatus according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a user representation generating apparatus according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of a terminal 1100 according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of a server 1200 according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

FIG. 1 is a diagram of a user representation generation system according to an embodiment of the present invention. Referring to FIG. 1, the user representation generation system includes: a first terminal 101, a server 102 and a second terminal 103.

The first terminal 101 is a terminal on which the anchor user performs live broadcasting. In a time period when a anchor user performs song singing live broadcasting, the first terminal 101 may generate a video file of the anchor user based on the collected video data and audio data; after the anchor user finishes the live broadcast, the first terminal 101 may record song information corresponding to the video file, and send the video file and the song information corresponding to the video file to the server 102.

The server 102 is configured to analyze a video file of a anchor user and song information corresponding to the video file, and acquire fingerprint information of the anchor user, where the fingerprint information is used to describe the anchor user.

The second terminal 103 is configured to, when detecting a play operation for any video file of the anchor user, play a video file corresponding to the play operation; the second terminal 103 is further configured to obtain, during or after the video file is played, a description tag selected by the viewer user for the anchor user and preference information of the viewer user for the anchor user; the second terminal 103 is also arranged to send the description label and the preference information to the server 102.

The first terminal 101 and the server 102, and the server 102 and the second terminal 103 may communicate with each other through a wireless network or a wired network.

FIG. 2 is a flowchart of a method for generating a user representation according to an embodiment of the present invention. Referring to fig. 2, the method includes:

201. the video file of the anchor user is analyzed to obtain fingerprint information of the anchor user, the video file is generated based on video data and audio data collected when the anchor user carries out song singing and live broadcasting, and the fingerprint information is used for describing the anchor user.

202. And acquiring the description label selected by the audience user for the anchor user according to the video file and the preference information of the anchor user and the video file, wherein the preference information comprises likes and dislikes.

203. And updating the fingerprint information of the anchor user according to the description label and the preference information.

204. And generating a user portrait of the anchor user according to the updated fingerprint information.

According to the method provided by the embodiment of the invention, the video file of the anchor user is analyzed, the fingerprint information for describing the anchor user is obtained, then the description label selected by the audience user for the anchor user is further obtained, the favorite information of the anchor user and the video file is obtained, and the user portrait of the anchor user is generated after the updated fingerprint information is obtained.

In one possible implementation manner, the analyzing the video file of the anchor user to obtain the fingerprint information of the anchor user includes:

and performing sound analysis on the video file, analyzing song information corresponding to the video file, and acquiring the audio fingerprint of the anchor user, wherein the song information comprises at least one of a song name and a song category, and the audio fingerprint is used for describing the song singing behavior of the anchor user.

In a possible implementation manner, the performing sound analysis on the video file, performing analysis on song information corresponding to the video file, and acquiring an audio fingerprint of the anchor user includes:

and determining at least one of the name and category of the song which the anchor user is adept at singing according to the similarity and the quantity.

and determining at least one of the song name and the song category which the anchor user is good at singing according to the singing score and the number.

and performing character analysis and scene analysis on the video file to acquire the video fingerprint of the anchor user, wherein the video fingerprint is used for describing the appearance and the live video quality of the anchor user.

In one possible implementation manner, the performing character analysis and scene analysis on the video file to obtain the video fingerprint of the anchor user includes:

In one possible implementation, after generating the user representation of the anchor user based on the updated fingerprint information, the method further comprises:

In one possible implementation, the live room recommendation to audience users based on user profiles of respective anchor users includes:

when any audience user is detected to log in the server, recommending a live broadcast room of the anchor user matched with the first historical behavior to the audience user according to the first historical behavior of the audience user and the user representation of each anchor user, wherein the first historical behavior comprises a description label selected by the audience user for the anchor user and preference information of the anchor user.

All the above-mentioned optional technical solutions can be combined arbitrarily to form the optional embodiments of the present invention, and are not described herein again.

FIG. 3 is a flowchart of a method for generating a user representation according to an embodiment of the present invention. Referring to fig. 3, the method includes:

301. an anchor selection interface is displayed for providing a plurality of anchor users for selection by the viewer user.

302. When a selection operation for any anchor user is detected, the video file of the anchor user is displayed.

303. And when the playing operation of the video file is detected, playing the video file.

304. And acquiring the description label selected by the audience user for the main broadcasting user and the preference information of the audience user for the main broadcasting user and the video file in the process of playing the video file or after the playing is finished, wherein the preference information comprises likes and dislikes.

305. The descriptive label and the preference information are sent to a server, which is used to generate a user representation of the anchor user.

According to the method provided by the embodiment of the invention, the description label selected by the audience user for the anchor user and the favorite information of the anchor user and the video file are obtained according to the operation of the audience user, and the obtained description label and the favorite information are sent to the server, so that the server can update the fingerprint information according to the information and then generate the user portrait of the anchor user.

when the selection operation of at least one description label in the plurality of description labels is detected, the at least one description label is used as the description label selected by the audience user for the anchor user;

and when the selection operation of any one of the first option and the second option is detected, taking the preference information corresponding to the option as the preference information of the audience user to the anchor user.

In one possible implementation, the method further comprises:

and simultaneously displaying at least one of the song name and the song category which the anchor user is good at singing when the video file of the anchor user is displayed.

FIG. 4 is a flowchart of a method for generating a user representation according to an embodiment of the present invention. Referring to fig. 4, the method includes:

401. in a time period when a anchor user carries out song singing live broadcasting, the first terminal generates a video file of the anchor user based on the collected video data and audio data.

In the embodiment of the invention, the first terminal can be provided with the specified application, and the specified application can be a live broadcasting tool and is used for a main broadcasting user to carry out live broadcasting. For example, the specified application may be an accompaniment application for playing an accompaniment of a song. If a director user wants to sing a certain song in the live broadcasting process, the director user can select the song on a live broadcasting interface displayed by the first terminal, then the song is played, and when the playing operation is detected, the first terminal can play the accompaniment of the song. When the first terminal starts playing the accompaniment of the song, the anchor user can start singing the song, and when the first terminal stops playing the accompaniment of the song, the anchor user can end singing the song.

When the anchor user triggers the accompaniment of a song played by the first terminal through operation and starts singing the song, the first terminal can start to collect video data and audio data when the anchor user is in live broadcasting, and when the first terminal stops playing the accompaniment of the song and the anchor user finishes singing the song, the first terminal can stop collecting the video data and the audio data. The first terminal can generate a video file based on the video data and the audio data collected in the live broadcast time period, and the video file is used as a video file of the anchor user.

402. And after the anchor user finishes song singing and live broadcasting, the first terminal records song information corresponding to the video file, wherein the song information comprises at least one of a song name and a song category.

In the embodiment of the present invention, after the anchor user sings the song, the first terminal may record song information of the anchor user singing the song, including but not limited to at least one of a song name and a song category, and of course, in addition to the song information, the first terminal may also record other data of the anchor user singing the song, such as whether to have a chorus with wheat or not.

It should be noted that this step 402 is an optional step, and helps analyze the song that the anchor user is adept at singing by recording the song information of the song that the anchor user performs.

403. And the first terminal sends the video file and the song information corresponding to the video file to a server.

The embodiment of the present invention is described by taking an example that the first terminal generates the video file and sends the video file to the server, and it can be understood that the first terminal may also send only the collected video data and audio data to the server, and the server generates the video file based on the video data and audio data.

In this step 403, the first terminal transmits both the video file and the song information corresponding to the video file to the server, but actually, in the case where the step 402 is optional, the first terminal may transmit only the video file to the server without transmitting the song information corresponding to the video file to the server.

404. The server analyzes a video file of a main broadcasting user to obtain fingerprint information of the main broadcasting user, the video file is generated based on video data and audio data collected when the main broadcasting user carries out song singing and live broadcasting, and the fingerprint information is used for describing the main broadcasting user.

In one possible implementation, the fingerprint information includes an audio fingerprint, the analyzing the video file of the anchor user, and the obtaining the fingerprint information of the anchor user includes: and analyzing the sound of the video file, analyzing song information corresponding to the video file, and acquiring the audio fingerprint of the anchor user, wherein the audio fingerprint is used for describing the song singing behavior of the anchor user.

The song information corresponding to the video file can be sent to the server by the first terminal, and the song information corresponding to the video file can be obtained by the server through song identification of the video file.

In one possible implementation, obtaining the audio fingerprint of the anchor user includes, but is not limited to, the following two ways:

the first mode is that sound analysis is carried out on the video file, and the similarity of the anchor user and the song corresponding to the video file sung by the original singer is determined; analyzing song information corresponding to the video file, and counting the number of each song and each kind of songs sung by the anchor user; and determining at least one of the name and category of the song which the anchor user is adept at singing according to the similarity and the quantity.

If there are multiple video files of the anchor user, for each video file, taking the song corresponding to the video file 1 as the song a as an example, the server may perform sound analysis on the video file 1 to obtain the similarity between the anchor user and the original singer of the song a. In addition, the server may analyze song information corresponding to the plurality of video files, and count the number of each song and each type of songs sung by the anchor user, where the number of each type of songs sung by the anchor user may be the sum of the number of all songs belonging to the type sung by the anchor user. Then, the server can combine the similarity of the anchor user and the original singer of each song to obtain the song which the anchor user is adept at singing, thereby obtaining the song name and the song category which are adept at singing. For example, when the number of songs performed by the anchor user reaches the number threshold and the average similarity between the anchor user and the original singer of the song reaches the similarity threshold, the server may use the song as a song that the anchor user is adept at performing, and use the name of the song as the name of the song that the anchor user is adept at performing, where the average similarity may be an average of similarities during multiple times of performing. Similarly, when the number of the songs of a certain category sung by the anchor user reaches the number threshold value and the average similarity between the anchor user and the original singer of each song in the category of songs reaches the similarity threshold value, the server can use the category of the songs as the category of the songs which the anchor user is adept at sung.

In the second mode, sound analysis is carried out on the video file to obtain a singing score of the anchor user singing the song corresponding to the video file; analyzing song information corresponding to the video file, and counting the number of each song and each kind of songs sung by the anchor user; and determining at least one of the song name and the song category which the anchor user is good at singing according to the singing score and the number.

In this way, for each video file of the anchor user, by performing sound analysis on the video file, the server can obtain the singing score of the anchor user singing the song corresponding to the video file by adopting a preset scoring rule, and thus, the server can obtain the singing score of each song sung by the anchor user. Further, the server may derive songs and song categories that the anchor user is adept at singing in conjunction with the number of each song and each category of songs that the anchor user sings.

For example, when the number of songs sung by the anchor user reaches the number threshold and the average singing score of the songs sung by the anchor user reaches the score threshold, the server may use the name of the song as the name of the song that the anchor user is adept at singing, wherein the average singing score may be an average of singing scores of multiple singing. Similarly, when the number of the songs of a certain category sung by the anchor user reaches the number threshold and the average singing score of each song in the songs of the anchor user reaches the score threshold, the server can use the category of the songs as the category of the songs which the anchor user is adept at singing.

After acquiring at least one of the song name and the song category which the anchor user is adept at singing, the server can initialize an audio fingerprint for the anchor user, namely, at least one of the song name and the song category which the anchor user is adept at singing is used as the initial audio fingerprint of the anchor user.

In one possible implementation, the fingerprint information includes a video fingerprint, and the analyzing the video file of the anchor user, and the obtaining the fingerprint information of the anchor user includes: and performing character analysis and scene analysis on the video file to acquire the video fingerprint of the anchor user, wherein the video fingerprint is used for describing the appearance and the live video quality of the anchor user.

In one possible implementation, the video fingerprint acquisition process may include: and carrying out face recognition, clothing and scene analysis on the video file, and determining the video picture resolution, the gender and the face value score of the anchor user.

The color score is a scoring attribute, and the server can obtain the color score of the anchor user by adopting a scoring rule of the scoring attribute. After obtaining the video picture resolution, the gender and the color value score of the anchor user, the server may initialize a video fingerprint for the anchor user, that is, the video picture resolution, the gender and the color value score are used as the initial video fingerprint of the anchor user.

It should be noted that, the step 404 may only include the above-mentioned process of acquiring an audio fingerprint, may only include the above-mentioned process of acquiring a video fingerprint, and may also include both the above-mentioned processes of acquiring an audio fingerprint and a video fingerprint.

This step 404 is a process in which the server initializes the audio fingerprint and the video fingerprint for the anchor user, and further, the server may also supplement the initial audio fingerprint and the video fingerprint, which is specifically described in the following steps 405 to 409.

405. And the second terminal displays a main broadcasting selection interface, and when the selection operation of any main broadcasting user is detected, the video file of the main broadcasting user is displayed, and the main broadcasting selection interface is used for providing a plurality of main broadcasting users for the selection of audience users.

In the embodiment of the present invention, the second terminal may provide a anchor selection interface, and the audience user may perform a selection operation on any anchor user in the anchor selection interface to trigger the second terminal to display the video file of the anchor user, and the name and category of the song that the anchor user is adept at singing.

It should be noted that the second terminal may display all video files of the anchor user, may also display a specified number of video files of the anchor user, and may also display only the video files of the anchor user within a specified time period, for example, display the video file that the anchor user has broadcast last time.

Optionally, the second terminal may also display at least one of a song title and a song category that the anchor user is adept at singing at the same time when displaying the video file of the anchor user. By displaying at least one of the song title and the song category that the anchor user excels in singing, the audience user can know the live content bias of the video file, and therefore whether to watch the video file is determined.

406. And when the playing operation of the video file is detected, the second terminal plays the video file.

In the embodiment of the invention, if a viewer user wants to watch any video file of the main broadcasting user, the viewer user can perform playing operation on the video file, for example, click the video file to trigger the second terminal to play the video file.

407. And in the process of playing the video file or after the playing is finished, the second terminal acquires the description label selected by the audience user for the main broadcast user and the favorite information of the audience user for the main broadcast user and the video file, wherein the favorite information comprises likes and dislikes.

In one possible implementation, the step 407 may include the following steps a to c:

step a, a second terminal displays a plurality of preset description labels, a first option and a second option, wherein the first option and the second option correspond to different preference information;

the description tags may include tags describing the appearance of the anchor user, such as melon seed faces, and may also include tags describing the sound of the anchor user, such as sweet sound. The first option may be a like option and the second option may be a dislike option, the like option indicating a like of the anchor user and the video files of the anchor user, the dislike option indicating a dislike of the anchor user and the video files of the anchor user.

Referring to fig. 5, fig. 5 is a schematic diagram of a user interface provided by an embodiment of the present invention, and as shown in fig. 5, the user interface may be a color identification interface, in which a plurality of preset description labels such as rally, melon seed face, sweet voice, and quadratic element wind, and two options of "fork-shaped" and "love heart-shaped" may be displayed.

The second terminal may display the preset description tags, the first option and the second option while playing the video file, or may display the description tags, the first option and the second option after the video file is played. Accordingly, the viewer user may select a description tag from the description tags for the anchor user and select an option from the first option and the second option during the playing of the video file at the second terminal, or may perform the tag option operation and the option selection operation after the playing of the video file is finished.

And b, when the selection operation of at least one of the plurality of description labels is detected, the second terminal takes the at least one description label as the description label selected by the audience user for the anchor user.

If the viewer user feels that one or more description tags can be used for describing the viewer user, the one or more description tags can be selected from the one or more description tags, so that the second terminal acquires the description tag selected by the viewer user for the anchor user.

And c, when the selection operation of any option of the first option and the second option is detected, the second terminal takes the preference information corresponding to the option as the preference information of the audience user to the main broadcasting user.

If the selection operation of the first option is detected, the second terminal may consider that the viewer user likes the anchor user and the video file of the anchor user, that is, the viewer user likes the preference information of the anchor user. If the selection operation of the second option is detected, the second terminal may consider that the viewer user dislikes the anchor user and the video file of the anchor user, that is, the preference information of the viewer user for the anchor user is disliked.

408. The second terminal transmits the description tag and the preference information to the server.

In the embodiment of the present invention, the second terminal may send the description tag selected by the viewer user for the anchor user, and the preference information of the anchor user and the video file to the server.

409. And after acquiring the description label selected by the audience user for the anchor user according to the video file and the preference information of the anchor user and the video file, the server updates the fingerprint information of the anchor user according to the description label and the preference information.

In the embodiment of the invention, the server can update the fingerprint information of the anchor user by using the description labels selected by a large number of audience users for the anchor user and the preference information of the anchor user, namely, the description labels and the preference information are added into the initial fingerprint information to obtain more complete fingerprint information of the anchor user.

Fingerprint information is initialized for the anchor user through step 404, and then the anchor user is supplemented with the fingerprint information through step 409, so that the fingerprint information of the anchor user is more comprehensive and rich.

410. The server generates a user representation of the anchor user based on the updated fingerprint information.

In the embodiment of the present invention, the updated fingerprint information includes at least one of a song name and a song category that the anchor user is adept at singing, a video frame resolution ratio when the anchor user is live broadcasting, a gender and a color score of the anchor user, a description tag selected by an audience user for the anchor user, preference information of the anchor user and the video file, and the like, and all of these information can be used for describing the anchor user. In one possible implementation, the process of generating the user image by the server includes: and generating a user portrait of the anchor user by adopting an information fusion algorithm according to the updated fingerprint information.

411. And according to the user figures of the anchor users, the server recommends the live broadcast rooms of the anchor users to audience users.

In the embodiment of the present invention, the server may execute the above steps 401 to 410 for each anchor user, obtain the user portrait of each anchor user, and further apply the user portrait to the live broadcast room recommendation of the anchor user. In one possible implementation, the recommendation process includes, but is not limited to, the following two ways:

in a first mode, when any viewer user is detected to log in the server, according to a first historical behavior of the viewer user and user figures of all anchor users, a live broadcast room of the anchor user matched with the first historical behavior is recommended to the viewer user, and the first historical behavior comprises a description label selected by the viewer user for the anchor user and preference information of the anchor user.

The user portrait of the anchor user can reflect the description label selected by the audience user for the anchor user and the preference information of the anchor user, and if the user portrait of a certain anchor user and the first historical behavior contain the same description label and preference information, the server can consider that the anchor user is matched with the first historical behavior of the audience user, and then recommend the live broadcast room of the anchor user to the audience user. Taking a main broadcasting user a and a spectator user B as an example, description tags selected by each spectator user for the main broadcasting user a include a tag 1 and a tag 2, and the spectator user who selects the tag 1 and the tag 2 likes the favorite information of the main broadcasting user a, the spectator user B selects the tag for each main broadcasting user and includes the tag 1, and the spectator user B likes the favorite information of the main broadcasting user corresponding to the tag 1, so that the server can recommend the live broadcasting room of the main broadcasting user a to the spectator user B.

The application scene of the method is that when audiences log in a server of a live broadcast platform and enter a home page of the live broadcast platform, the home page can display some recommended live broadcast rooms. Referring to fig. 6, fig. 6 is a schematic diagram of a recommended live broadcast room provided by an embodiment of the present invention, as shown in fig. 6, a home page may display a plurality of live broadcast rooms from an anchor user a, an anchor user B to an anchor user F, and the recommended live broadcast rooms are displayed to users in a "guess-you-like" spot form, and the live broadcast rooms are live broadcast rooms of the anchor user to which a user figure corresponding to the habit recommended by an audience user belongs by a server according to a description tag selected by the audience user for the anchor user and the favorite behavior of the anchor user.

In a second mode, when it is detected that any audience user plays a song, according to a second historical behavior of the audience user and the user representation of each anchor user, a live broadcast room of the anchor user matched with the second historical behavior is recommended to the audience user, and the second historical behavior comprises historical behaviors of the audience user in playing operation, favorite operation and collection operation of the song.

The user profile of the anchor user may reflect a song that the anchor user is adept at singing, and if the user profile of a particular anchor user contains the same song as the second historical behavior, the anchor user may be deemed to be matched to the second historical behavior of the audience user. Taking the anchor user a and the audience user B as an example, the songs that the anchor user a is adept at singing include song 1 and song 2, and the audience user B performs play operation, favorite operation or collection operation on the song 1, so that the server can recommend the live broadcast room of the anchor user a to the audience user B.

The application scenario of the method is that when the audience user plays songs on the music application, the server can recommend the live broadcast room of the main broadcast user of the user portrait corresponding to the audience user according to the songs and the song lists listened, liked and collected by the audience user on the music application. Referring to fig. 7, fig. 7 is a schematic diagram of a recommended live broadcast room according to an embodiment of the present invention, and as shown in fig. 7, a live broadcast room of a certain anchor user recommended to viewer users may be displayed in an upper right area of a song playing interface.

It should be noted that step 411 is an optional step, and by recommending a live broadcast room that may be interested by the user of the audience according to the user representation of the anchor user, the accuracy of information recommendation can be improved.

FIG. 8 is a schematic structural diagram of a user representation generating apparatus according to an embodiment of the present invention. Referring to fig. 8, the apparatus includes:

an obtaining module 801, configured to analyze a video file of a anchor user and obtain fingerprint information of the anchor user, where the video file is generated based on video data and audio data collected when the anchor user performs song singing and live broadcasting, and the fingerprint information is used to describe the anchor user;

the obtaining module 801 is further configured to obtain a description tag selected by a viewer user for the anchor user according to the video file, and preference information of the anchor user and the video file, where the preference information includes likes and dislikes;

an updating module 802, configured to update the fingerprint information of the anchor user according to the description tag and the preference information;

a generating module 803, configured to generate a user representation of the anchor user according to the updated fingerprint information.

In one possible implementation, the obtaining module 801 is configured to:

In one possible implementation manner, the obtaining module 801 is configured to perform character analysis and scene analysis on the video file, and obtain a video fingerprint of the anchor user, where the video fingerprint is used to describe the appearance and live video quality of the anchor user.

In one possible implementation, the obtaining module 801 is configured to perform face recognition, clothing and scene analysis on the video file, and determine a video frame resolution, a gender and a face score of the anchor user.

In one possible implementation, referring to fig. 9, the apparatus further includes:

and the recommending module 804 is used for recommending the live broadcast room to audience users according to the user figures of all the anchor users.

In one possible implementation manner, the recommending module 804 is configured to, when it is detected that any viewer user logs in the server, recommend, to the viewer user, a live room of the anchor user matching the first historical behavior according to the first historical behavior of the viewer user and the user profile of each anchor user, where the first historical behavior includes a description tag selected by the viewer user for the anchor user and preference information of the anchor user.

In one possible implementation manner, the recommending module 804 is configured to recommend, to a viewer user, a live broadcast room of a anchor user matching a second historical behavior of the viewer user according to the second historical behavior of the viewer user and a user profile of each anchor user when it is detected that any viewer user plays a song, where the second historical behavior includes historical behaviors of the viewer user in performing a play operation, a favorite operation, and a favorite operation on the song.

In the embodiment of the invention, the video file of the anchor user is analyzed to obtain the fingerprint information for describing the anchor user, then the description label selected by the audience user for the anchor user and the favorite information of the anchor user and the video file are further obtained, and the user portrait of the anchor user is generated after the updated fingerprint information is obtained.

FIG. 10 is a schematic structural diagram of a user representation generating apparatus according to an embodiment of the present invention. Referring to fig. 10, the apparatus includes:

a display module 1001 configured to display a anchor selection interface, where the anchor selection interface is configured to provide a plurality of anchor users for selection by a viewer user;

the display module 1001 is further configured to display a video file of any anchor user when a selection operation on the anchor user is detected;

a playing module 1002, configured to play the video file when a playing operation on the video file is detected;

an obtaining module 1003, configured to obtain, during or after playing the video file, a description tag selected by a viewer user for the anchor user and preference information of the viewer user for the anchor user and the video file, where the preference information includes likes and dislikes;

a sending module 1004 for sending the description tag and the preference information to a server, the description tag and the preference information being used by the server to generate a user representation of the anchor user.

In one possible implementation, the obtaining module 1003 is configured to:

In one possible implementation, the display module 1001 is further configured to display at least one of a song title and a song category that the anchor user is adept at singing simultaneously when displaying the video file of the anchor user.

In the embodiment of the invention, the description label selected by the audience user for the anchor user and the preference information of the anchor user and the video file are obtained according to the operation of the audience user, and the obtained description label and the preference information are sent to the server, so that the server can generate the user portrait of the anchor user after updating the fingerprint information according to the information.

It should be noted that: the user representation generating apparatus provided in the above embodiments is only illustrated by the division of the above functional modules when generating the user representation, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the functions described above. In addition, the user portrait generation apparatus provided in the above embodiment and the user portrait generation method embodiment belong to the same concept, and specific implementation processes thereof are detailed in the method embodiment and are not described herein again.

Fig. 11 is a schematic structural diagram of a terminal 1100 according to an embodiment of the present invention. The terminal 1100 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 1100 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so forth.

In general, terminal 1100 includes: a processor 1101 and a memory 1102.

Processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1101 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1101 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1101 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, the processor 1101 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 can also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer-readable storage medium in memory 1102 is used to store at least one instruction for execution by processor 1101 to implement the user representation generation method provided by method embodiments herein.

In some embodiments, the terminal 1100 may further include: a peripheral interface 1103 and at least one peripheral. The processor 1101, memory 1102 and peripheral interface 1103 may be connected by a bus or signal lines. Various peripheral devices may be connected to the peripheral interface 1103 by buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1104, display screen 1105, camera 1106, audio circuitry 1107, positioning component 1108, and power supply 1109.

The peripheral interface 1103 may be used to connect at least one peripheral associated with I/O (Input/Output) to the processor 1101 and the memory 1102. In some embodiments, the processor 1101, memory 1102, and peripheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1101, the memory 1102 and the peripheral device interface 1103 may be implemented on separate chips or circuit boards, which is not limited by this embodiment.

The Radio Frequency circuit 1104 is used to receive and transmit RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1104 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1104 converts an electric signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electric signal. Optionally, the radio frequency circuit 1104 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1104 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1104 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1105 is used to display a UI (user interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1105 is a touch display screen, the display screen 1105 also has the ability to capture touch signals on or over the surface of the display screen 1105. The touch signal may be input to the processor 1101 as a control signal for processing. At this point, the display screen 1105 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display 1105 may be one, providing the front panel of terminal 1100; in other embodiments, the display screens 1105 can be at least two, respectively disposed on different surfaces of the terminal 1100 or in a folded design; in still other embodiments, display 1105 can be a flexible display disposed on a curved surface or on a folded surface of terminal 1100. Even further, the display screen 1105 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display screen 1105 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

Camera assembly 1106 is used to capture images or video. Optionally, camera assembly 1106 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1106 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 1107 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1101 for processing or inputting the electric signals to the radio frequency circuit 1104 to achieve voice communication. For stereo capture or noise reduction purposes, multiple microphones may be provided, each at a different location of terminal 1100. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1101 or the radio frequency circuit 1104 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1107 may also include a headphone jack.

Positioning component 1108 is used to locate the current geographic position of terminal 1100 for purposes of navigation or LBS (Location Based Service). The Positioning component 1108 may be a Positioning component based on the united states GPS (Global Positioning System), the chinese beidou System, the russian graves System, or the european union galileo System.

Power supply 1109 is configured to provide power to various components within terminal 1100. The power supply 1109 may be alternating current, direct current, disposable or rechargeable. When the power supply 1109 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1100 can also include one or more sensors 1110. The one or more sensors 1110 include, but are not limited to: acceleration sensor 1111, gyro sensor 1112, pressure sensor 1113, fingerprint sensor 1114, optical sensor 1115, and proximity sensor 1116.

Acceleration sensor 1111 may detect acceleration levels in three coordinate axes of a coordinate system established with terminal 1100. For example, the acceleration sensor 1111 may be configured to detect components of the gravitational acceleration in three coordinate axes. The processor 1101 may control the touch display screen 1105 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1111. The acceleration sensor 1111 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 1112 may detect a body direction and a rotation angle of the terminal 1100, and the gyro sensor 1112 may cooperate with the acceleration sensor 1111 to acquire a 3D motion of the user with respect to the terminal 1100. From the data collected by gyroscope sensor 1112, processor 1101 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensor 1113 may be disposed on a side bezel of terminal 1100 and/or on an underlying layer of touch display screen 1105. When the pressure sensor 1113 is disposed on the side frame of the terminal 1100, the holding signal of the terminal 1100 from the user can be detected, and the processor 1101 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1113. When the pressure sensor 1113 is disposed at the lower layer of the touch display screen 1105, the processor 1101 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 1105. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1114 is configured to collect a fingerprint of the user, and the processor 1101 identifies the user according to the fingerprint collected by the fingerprint sensor 1114, or the fingerprint sensor 1114 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the user is authorized by the processor 1101 to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. Fingerprint sensor 1114 may be disposed on the front, back, or side of terminal 1100. When a physical button or vendor Logo is provided on the terminal 1100, the fingerprint sensor 1114 may be integrated with the physical button or vendor Logo.

Optical sensor 1115 is used to collect ambient light intensity. In one embodiment, the processor 1101 may control the display brightness of the touch display screen 1105 based on the ambient light intensity collected by the optical sensor 1115. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1105 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1105 is turned down. In another embodiment, processor 1101 may also dynamically adjust the shooting parameters of camera assembly 1106 based on the ambient light intensity collected by optical sensor 1115.

Proximity sensor 1116, also referred to as a distance sensor, is typically disposed on a front panel of terminal 1100. Proximity sensor 1116 is used to capture the distance between the user and the front face of terminal 1100. In one embodiment, the touch display screen 1105 is controlled by the processor 1101 to switch from a bright screen state to a dark screen state when the proximity sensor 1116 detects that the distance between the user and the front face of the terminal 1100 is gradually decreasing; when the proximity sensor 1116 detects that the distance between the user and the front face of the terminal 1100 becomes gradually larger, the touch display screen 1105 is controlled by the processor 1101 to switch from a breath-screen state to a bright-screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 11 does not constitute a limitation of terminal 1100, and may include more or fewer components than those shown, or may combine certain components, or may employ a different arrangement of components.

Fig. 12 is a schematic structural diagram of a server 1200 according to an embodiment of the present invention, where the server 1200 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1201 and one or more memories 1202, where the memory 1202 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 1201 to implement the methods provided by the foregoing method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, there is also provided a computer readable storage medium, such as a memory, storing a computer program, which when executed by a processor, implements the user representation generation method in the above embodiments. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present invention and should not be taken as limiting the invention, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of user representation generation, the method comprising:

analyzing a video file of a main broadcasting user to acquire fingerprint information of the main broadcasting user, wherein the video file is generated based on video data and audio data collected when the main broadcasting user carries out song singing and live broadcasting, the fingerprint information is used for describing the main broadcasting user, and the fingerprint information comprises at least one of video fingerprints and audio fingerprints;

2. The method of claim 1, wherein analyzing the video file of the anchor user to obtain fingerprint information of the anchor user comprises:

3. The method of claim 2, wherein the analyzing the sound of the video file and the song information corresponding to the video file to obtain the audio fingerprint of the anchor user comprises:

4. The method of claim 2, wherein the analyzing the sound of the video file and the song information corresponding to the video file to obtain the audio fingerprint of the anchor user comprises:

5. The method according to claim 1 or 2, wherein the analyzing the video file of the anchor user to obtain the fingerprint information of the anchor user comprises:

6. The method of claim 5, wherein performing character analysis and scene analysis on the video file to obtain the video fingerprint of the anchor user comprises:

7. The method of claim 1, wherein after generating the user representation of the anchor user based on the updated fingerprint information, the method further comprises:

8. The method of claim 7, wherein said live room recommendation to the audience user based on the user representation of each anchor user comprises:

9. The method of claim 7, wherein said live room recommendation to the audience user based on the user representation of each anchor user comprises:

10. A method of user representation generation, the method comprising:

and sending the description label and the preference information to a server, wherein the description label and the preference information are used for updating the fingerprint information of the anchor user by the server, and generating a user portrait of the anchor user based on the updated fingerprint information, the fingerprint information is obtained by analyzing a video file generated by the anchor user based on video data and audio data collected when the anchor user carries out song singing and live broadcasting by the anchor user by the server and is used for describing the anchor user, and the fingerprint information comprises at least one of a video fingerprint and an audio fingerprint.

11. The method of claim 10, wherein said obtaining the descriptive tags selected by the viewer user for the anchor user and the viewer user's preference information for the anchor user and the video file comprises:

12. The method of claim 10, further comprising:

13. A user representation generation apparatus, the apparatus comprising:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for analyzing a video file of a main broadcasting user and acquiring fingerprint information of the main broadcasting user, the video file is generated based on video data and audio data collected when the main broadcasting user carries out song singing and live broadcasting, the fingerprint information is used for describing the main broadcasting user, and the fingerprint information comprises at least one of a video fingerprint and an audio fingerprint;

14. The apparatus of claim 13, wherein the obtaining module is configured to:

15. The apparatus of claim 14, wherein the obtaining module is configured to:

16. The apparatus of claim 14, wherein the obtaining module is configured to:

17. The apparatus of claim 13 or 14, wherein the obtaining module is configured to:

18. The apparatus of claim 17, wherein the capture module is configured to perform face recognition, clothing and set analysis on the video file to determine a video frame resolution, a gender and a face score of the anchor user.

19. The apparatus of claim 13, further comprising:

20. The apparatus of claim 19, wherein the recommending module is configured to recommend to the viewer user a live room of the anchor user matching a first historical behavior of the viewer user based on the first historical behavior of the viewer user and a user profile of the respective anchor user when detecting that any viewer user logs in to the server, the first historical behavior including a descriptive tag selected by the viewer user for the anchor user and preference information for the anchor user.

21. The apparatus of claim 19, wherein the recommendation module is configured to, when it is detected that any of the viewer users plays a song, recommend a live room of the anchor user matching a second historical behavior to the viewer user based on the second historical behavior of the viewer user and the user profile of each of the anchor users, and the second historical behavior comprises historical behaviors of the viewer user performing a play operation, a favorite operation, and a favorite operation on the song.

22. A user representation generation apparatus, the apparatus comprising:

the sending module is used for sending the description label and the preference information to a server, the description label and the preference information are used for updating the fingerprint information of the anchor user by the server, and generating a user portrait of the anchor user based on the updated fingerprint information, the fingerprint information is obtained by analyzing a video file generated by video data and audio data collected by the anchor user when the anchor user carries out song singing and live broadcasting by the server and is used for describing the anchor user, and the fingerprint information comprises at least one of video fingerprints and audio fingerprints.

23. The apparatus of claim 22, wherein the obtaining module is configured to:

24. The apparatus of claim 22, wherein the display module is further configured to simultaneously display at least one of a song title and a song category that the anchor user is adept at singing while displaying the video file of the anchor user.

25. A user representation generation system comprising a first terminal, a server and a second terminal,

the server is used for analyzing a video file of a main broadcasting user and acquiring fingerprint information of the main broadcasting user, wherein the fingerprint information comprises at least one of a video fingerprint and an audio fingerprint;

26. A server, comprising a processor and a memory; the memory is used for storing a computer program; the processor, configured to execute the computer program stored in the memory, implements the method steps of any of claims 1-9.

27. A terminal comprising a processor and a memory; the memory is used for storing a computer program; the processor, configured to execute the computer program stored in the memory, implements the method steps of any of claims 10-12.

28. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-12.