CN110460903A - Based on speech analysis to the method, apparatus and computer equipment of program review - Google Patents

Based on speech analysis to the method, apparatus and computer equipment of program review Download PDF

Info

Publication number
CN110460903A
CN110460903A CN201910651425.1A CN201910651425A CN110460903A CN 110460903 A CN110460903 A CN 110460903A CN 201910651425 A CN201910651425 A CN 201910651425A CN 110460903 A CN110460903 A CN 110460903A
Authority
CN
China
Prior art keywords
mood
user
program
comment
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910651425.1A
Other languages
Chinese (zh)
Inventor
赵付利
文莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910651425.1A priority Critical patent/CN110460903A/en
Priority to PCT/CN2019/116702 priority patent/WO2021008025A1/en
Publication of CN110460903A publication Critical patent/CN110460903A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44218Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV program
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/475End-user interface for inputting end-user data, e.g. personal identification number [PIN], preference data
    • H04N21/4756End-user interface for inputting end-user data, e.g. personal identification number [PIN], preference data for rating content, e.g. scoring a recommended movie
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Abstract

This application discloses a kind of based on speech analysis to the method, apparatus and computer equipment of program review, and wherein method includes: to receive the video information of terminal acquisition;The video information is analyzed, the corresponding programme information of the video information is obtained;The comment interface in programme information is loaded to terminal, critical circles face includes the voice port that user inputs comment;User is converted into text by the voice messaging that the voice port inputs;The text is uploaded into comment interface.The application receives the voice signal of user's input, and is automatically converted into text, to be commented on, compares user's typewriting input, saves the time of input comment, does not influence user and watches program.

Description

Based on speech analysis to the method, apparatus and computer equipment of program review
Technical field
This application involves field of computer technology is arrived, a kind of side based on speech analysis to program review is especially related to Method, device and computer equipment.
Background technique
When network TV program or program of radio station are commented on, for the angle of user, no matter user whether watch or The program was listened to, the program can be commented on, and for the angle of time, user starts to play it in the program Before, in playing process and play terminate after, the program can be commented on.
But user carries out text reviews when watching program, typewriting needs the regular hour, can miss section in this way Mesh content.And the cost time is compared in the description of text, cannot rapidly reflect user to the viewpoint of program.In addition, literary The content of word statement is single, cannot accurately give expression to very much user to the viewpoint of program.
Summary of the invention
The main purpose of the application is to provide a kind of method, apparatus and computer based on speech analysis to program review Equipment, it is intended to solve the problems, such as that above-mentioned comment program is time-consuming more.
In order to achieve the above-mentioned object of the invention, the application proposes a kind of method based on speech analysis to program review, comprising:
Receive the video information of terminal acquisition;
The video information is analyzed, the corresponding programme information of the video information is obtained;
The comment interface in programme information is loaded to terminal, critical circles face includes the voice port that user inputs comment;
User is converted into text by the voice messaging that the voice port inputs;
The text is uploaded into comment interface.
Further, the described the step of text is uploaded into comment interface, comprising:
The first mood of user is obtained according to the voice messaging;
According to first mood, expression corresponding with first mood is searched in expression library;
The text and the expression are uploaded into comment interface.
Further, the step of first mood that user is obtained according to the voice messaging, comprising:
Extract the mood word in text;
According to the mood word, the corresponding relationship of mood word and mood is called, the first mood of user is obtained.
Further, the step of first mood that user is obtained according to the voice messaging, comprising:
The voice messaging is input in preset voice mood identification model, the voice messaging corresponding is exported One mood.
Further, the video information comes the video information that the camera of self terminal is shot, the analysis institute The step of stating video information, obtaining the video information corresponding programme information, comprising:
Extract a program video in at least two frame pictures and server of the video information;
By picture described at least two frames respectively in the program video each frame carry out similarity calculation, obtain with extremely The one-to-one at least two programs frame of picture and corresponding similarity value described in few two frames;
If at least two similarity values are above preset similarity threshold, calculate picture described at least two frames with Whether the time interval of corresponding at least two programs frame is identical;
If so, determining that the program is the corresponding program of the video information.
Further, before the described the step of text and the expression are uploaded to comment interface, comprising:
The facial information of the camera acquisition user of controlling terminal;
The facial information is input in preset face Emotion identification model, the facial information corresponding is exported Two moods;
Judge whether first mood and second mood are identical;
If so, generating the instruction that the text and the expression are uploaded to comment interface.
Further, the step of comment interface loaded in programme information is to terminal, comprising:
Obtain the comment type of user;
Load in programme information with the corresponding comment interface of comment type to terminal.
The application also provides a kind of device based on speech analysis to program review, comprising:
Receiving module, for receiving the video information of terminal acquisition;
Analysis module obtains the corresponding programme information of the video information for analyzing the video information
Loading module, for loading the comment interface in programme information to terminal, critical circles face includes that user inputs comment Voice port;
Conversion module, for user to be converted into text by the voice messaging that the voice port inputs;
Uploading module, for the text to be uploaded to comment interface.
The application also provides a kind of computer equipment, including memory and processor, and the memory is stored with computer The step of program, the processor realizes any of the above-described the method when executing the computer program.
The application also provides a kind of computer readable storage medium, is stored thereon with computer program, the computer journey The step of method described in any of the above embodiments is realized when sequence is executed by processor.
The application based on speech analysis to the prediction technique of program review, device and computer equipment, it is defeated to receive user The voice signal entered, and it is automatically converted into text, to be commented on, user's typewriting input is compared, saves the time of input comment, User is not influenced watches program.When being converted into text, automatically according to the text in voice, expression corresponding with text is added, Meanwhile mood when delivering voice also according to user, automatic addition expression corresponding with mood are more intuitive in this way to express user To the comment emotion of program.The facial expression for also extracting user adds corresponding expression according to the facial expression of user.By more A technology gets mood of the user when watching program, and adds corresponding expression according to mood, more true rapidly right Program is commented on, and does not influence the experience of the viewing program of user.
Detailed description of the invention
Fig. 1 is for one embodiment of the application based on speech analysis to the flow diagram of the method for program review;
Fig. 2 is for one embodiment of the application based on speech analysis to the structural schematic block diagram of the device of program review;
Fig. 3 is the structural schematic block diagram of the computer equipment of one embodiment of the application.
The embodiments will be further described with reference to the accompanying drawings for realization, functional characteristics and the advantage of the application purpose.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.
Referring to Fig.1, the embodiment of the present application provides a kind of method based on speech analysis to program review, comprising steps of
S1, the video information for receiving terminal acquisition;
S2, the analysis video information, benefit arrive the corresponding programme information of the video information;
S3, comment interface in programme information is loaded to terminal, critical circles face includes the voice port that user inputs comment;
S4, user is converted into text by the voice messaging that the voice port inputs;
S5, the text is uploaded to comment interface.
In the present embodiment, the video information of terminal acquisition can be the video information of terminal itself broadcasting, be also possible to end The video information that end is generated by camera shooting.In the first case, when terminal plays video information, server is in terminal The window of one transmission video information of upper load, after user clicks the window of the transmission video information, terminal sends out video information Give server, the video information that server receiving terminal is sent.In second situation, server loads one at the terminal The window for shooting video information, after user clicks the window, terminal starting camera carries out shooting and by the video information of shooting It is sent to server, the video information of server receiving terminal shooting in real time.
After server receives video information, in the first case, when video information includes video content, video The each various information relevant to video such as length, bit rate, frame rate, video name, wherein video content is finger programme information, Such as TV play title, variety show title, movie name.The video content in video information is read, analysis obtains video letter Programme information in breath.In second situation, server reads the picture in video information, will store in picture and server Multiple programs be compared, search the identical program of frame out, then read the information of program identical with picture, to divide The corresponding programme information of video information is precipitated.It, further, can also be by analysis video information in second situation Sound converts tones into text, i.e. sound in acquisition video information, to judge that video is believed by the corresponding text of sound Cease corresponding programme information.Meanwhile after server obtains the corresponding programme information of video information, while can also be by other users At the terminal to the comment of program load, check comment of the other users to program convenient for user, be convenient for and other users into Row exchange.
After getting programme information, the comment interface for commenting on the programme information is called in the server, is then loaded at end On end, when there is no the comment interface of the programme information in server, then the comment template in invoking server, by programme information plus Then the corresponding position being loaded in comment template is loaded at comment interface in terminal with forming the comment interface of the programme information On.The port for inputting comment on interface with user is commented on, user inputs the text of comment on that port, can also be by this Port inputs voice to comment on the programme information.
When user is that input voice comments on program, terminal is passed through the voice messaging of the port processing by server It is converted into text by speech analysis techniques, the text is then uploaded into comment interface, such user can rapidly lead to It crosses voice at once to comment on the program seen, and does not influence to watch program.
In one embodiment, the above-mentioned the step of text is uploaded into comment interface, comprising:
S51, the first mood that user is obtained according to the voice messaging;
S52, according to first mood, expression corresponding with first mood is searched in expression library;
S53, the text and the expression are uploaded to comment interface.
In the present embodiment, user when sending voice, according to program like and detest or to the role in program into When row comment, with the emotion of oneself, the mood of user can be extracted according to voice messaging, i.e., user is to program review The first mood.Mood includes a variety of, such as grieved, frightened, surprised, receiving, wild with joy, furious, vigilant, hatred.Server root The first mood of user is got according to voice messaging, then loads out by the corresponding expression of the first mood, is selected for user, so The corresponding text of voice messaging and the corresponding expression of voice messaging for the comment that user delivers program afterwards upload to comment Interface first loads out text, then loads out expression before upload, keeps above-mentioned text and expression common as a comment It appears in comment interface and more intuitively to give expression to the comment to programme information other users can rapidly be recognized Comment of the user to this program.There are multiple expressions, each expression has the label of a mood, works in expression library After expression is added to a mood label by personnel, expression is uploaded in expression library.The finger of expression library storage in the server Determine space.People, can be faster using speech message when sending message;And people read pictorial information most when reading message Fastly, followed by text information, is voice messaging again.This programme converts speech information into text information and pictorial information, Other users are facilitated rapidly to understand comment of the user to the program.
In one embodiment, the step of above-mentioned the first mood that user is obtained according to the voice messaging, comprising:
S511, mood word in text is extracted;
S512, according to the mood word, call the corresponding relationship of mood word and mood, obtain the first mood of user.
In the present embodiment, above-mentioned text refers to the text for being converted to voice messaging.Mood word is stored in mood In dictionary.After voice messaging is converted to text by server, word identical with word in mood dictionary in text is extracted Come, that is, extracts the mood word in text.Then the corresponding relationship for calling mood word and mood, gets the first of user Mood.Mood word is that staff sorts out, and gathers, and then mood word is mapped with mood, one A mood can correspond to multiple mood words.In one embodiment, the corresponding relationship of mood word and mood are as follows:
Mood word Mood
It is tired, gloomy, nauseous It is angry
Cock's crow, it gloomy, dancing, sings Happily
Terror is trembled Fear
It cries, tears, anxiety It is sad
In one embodiment, the step of above-mentioned the first mood that user is obtained according to the voice messaging, comprising:
S513, the voice messaging is input in preset voice mood identification model, exports the voice messaging pair The first mood answered.
In the present embodiment, when user speaks with different moods, the corresponding signal of corresponding voice messaging is not identical 's.Energy feature, pronunciation frame number feature, fundamental frequency when user says two sections, mood is identical, in two sections of voice messagings Feature, formant feature, harmonic to noise ratio feature and mel cepstrum coefficients feature etc. have apparent general character.By voice messaging It is input in voice mood identification model, is calculated by voice mood identification model, it is corresponding that output obtains above-mentioned voice messaging First mood.
In one embodiment, above-mentioned video information carrys out the video information that the camera of self terminal is shot, above-mentioned The step of analyzing the video information, obtaining the video information corresponding programme information, comprising:
S21, at least two frame pictures for extracting the video information and a program video in server;
S22, picture described at least two frames is subjected to similarity calculation with each frame in the program video respectively, obtained With the one-to-one at least two programs frame of picture described at least two frames and corresponding similarity value;
If S23, at least two similarity values are above preset similarity threshold, picture described at least two frames is calculated Whether face and the time interval of corresponding at least two programs frame are identical;
S24, if so, determining that the program is the corresponding program of the video information.
In the present embodiment, terminal is the mobile phone of user, and user is when seeing TV, it is seen that a variety show is want comprehensive to this Skill program is commented on, and mobile phone is picked up, and is opened wherein for commenting on the function or application of program, the camera of mobile phone controls camera shooting Head is taken pictures or is recorded a video, and if it is taking pictures, then at least claps two, and the interval time for shooting two is at least 5 seconds;If it is Video recording then at least records 5 seconds videos, chooses the first frame picture and last frame picture of video recording.Realization is mentioned by mobile phone Two frame pictures of video information are taken out.Mobile phone also accesses the server specified in the function or application simultaneously, extracts server In a program video, all frame pictures in this two frames picture and program video are then calculated into similarity, each frame view The picture of frequency information and all frame pictures of program video calculate a similarity, choose highest similarity value, obtain two A similarity value, then judges whether the two similarity values are above preset similarity value, if be above, illustrates to service The program video in device is possible to the video information that user is seeing be the same.It then extracts and calculates two similarities It is worth the information of the image frame in highest corresponding program video, then calculates two frames in the highest programme information of this similarity First time interval of the picture in programme information, while obtaining the video information of mobile phone shooting or acquisition when taking pictures of user In two frame pictures the second time interval, if first time interval is equal with the second time interval or gap be less than it is preset Interval threshold then determines that the video information that user sees is the program, on corresponding pop-up comment interface to the mobile phone of user, for User comments on.
In one embodiment, before the above-mentioned the step of text and the expression are uploaded to comment interface, packet It includes:
S531, controlling terminal camera acquisition user facial information;
S532, the facial information is input in preset face Emotion identification model, exports the facial information pair The second mood answered;
S533, judge whether first mood and second mood are identical;
S534, if so, generate by the text and the expression upload to comment interface instruction.
In the present embodiment, when user has different moods, muscle, the organ of face have different forms.Face The different shape of information equally also may determine that the mood of user, calculated through voice mood identification model for confirming herein Whether the mood in voice messaging is correct.Server generates the instruction of acquisition facial information, and then the camera of controlling terminal opens Dynamic, the front camera starting of priority acccess control terminal acquires the image of surrounding, and identifies that the maximum face in image is believed Breath, the i.e. facial information of user.Then facial information is input in face Emotion identification model, is exported based on facial information The second mood of obtained user.Second mood is compared by server with the first mood again, sees whether the two is consistent, such as Fruit is consistent, then generates the instruction that the text and the expression are uploaded to comment interface.Then it controls text and The corresponding expression of one mood uploads to comment interface.
In one embodiment, the step of comment interface in above-mentioned load programme information is to terminal, comprising:
S31, the comment type for obtaining user;
S32, load in programme information with the corresponding comment interface of comment type to terminal.
In the present embodiment, comment type can be divided into multiple types from different levels.In one embodiment, it comments on Type is divided into acute type and the non-saturating type of play thoroughly according to whether leaking information breath comprising play.After server analysis obtains programme information, adjust With programme information, the corresponding comment interface of programme information is found, two kinds of comment types is reloaded and is selected to terminal for user.So After the selection for receiving user afterwards, the comment interface of the comment type of user's selection is loaded into terminal.User comments in viewing When comment in interface, it is seen that the comment that user wishes to.It can especially seen to avoid acute saturating user is not liked It sees and is seen when comment about acute saturating comment, give the better experience effect of user.
In one embodiment, before the above-mentioned the step of text is uploaded to comment interface, comprising:
S501, sample data is input in neural network model, the sample data include multiple voice messagings and The corresponding mood of each voice messaging;
S502, the neural network model is trained, obtains voice mood identification model neural network based.
In the present embodiment, when using voice mood identification model, a voice mood identification model is trained first.It adopts With a neural network model, staff acquires the voice messaging of multistage difference mood, each voice messaging and correspondence Mood constitute a sample data, each voice messaging and corresponding emotional characteristics are input to neural network model In, neural network model carries out conclusion calculating to the corresponding voice messaging of same emotional characteristics, show that each mood is corresponding Voice mood migration index constitutes voice mood identification model neural network based.
In conclusion the voice of the application inputted based on speech analysis to the prediction technique of program review, reception user Signal, and it is automatically converted into text, to be commented on, user's typewriting input is compared, the time of input comment is saved, does not influence to use Program is watched at family.When being converted into text, automatically according to the text in voice, expression corresponding with text is added, meanwhile, go back root Mood when voice is delivered according to user, it is automatic to add expression corresponding with mood, user is more intuitively expressed in this way to program Comment on emotion.The facial expression for also extracting user adds corresponding expression according to the facial expression of user.By multiple technologies, Get mood of the user when watching program, and corresponding expression added according to mood, more it is true rapidly to program into Row comment, and the experience of the viewing program of user is not influenced.
Referring to Fig. 2, a kind of device based on speech analysis to program review is also provided in the embodiment of the present application, comprising:
Receiving module 1, for receiving the video information of terminal acquisition;
Analysis module 2 obtains the corresponding programme information of the video information for analyzing the video information;
Loading module 3, for loading the comment interface in programme information to terminal, critical circles face includes that user inputs comment Voice port;
Conversion module 4, for user to be converted into text by the voice messaging that the voice port inputs;
Uploading module 5, for the text to be uploaded to comment interface.
In the present embodiment, the video information of terminal acquisition can be the video information of terminal itself broadcasting, be also possible to end The video information that end is generated by camera shooting.In the first case, when terminal plays video information, receiving module 1 exists The window of a transmission video information is loaded in terminal, after user clicks the window of the transmission video information, terminal believes video Breath is sent to receiving module 1, and receiving module 1 receives the video information that terminal is sent.In second situation, receiving module 1 exists In terminal load one shooting video information window, after user clicks the window, terminal starting camera shot and incite somebody to action The video information of shooting is sent to receiving module 1 in real time, and receiving module 1 receives the video information of terminal shooting.
After receiving module 1 receives video information, in the first case, video information includes video content, video The each various information relevant to video such as duration, bit rate, frame rate, video name, wherein video content is finger joint mesh letter Breath, such as TV play title, variety show title, movie name.Analysis module 2 reads the video content in video information, point Analysis obtains the programme information in video information.In second situation, server reads the picture in video information, analysis module 2 the multiple programs stored in picture and server are compared, and search the identical program of frame out, then read and picture phase The information of same program, to analyze the corresponding programme information of video information.In second situation, further, may be used also Text, i.e. sound in acquisition video information are converted tones into, to pass through sound by the sound in analysis video information Corresponding text judges the corresponding programme information of video information.Meanwhile server obtains the corresponding programme information of video information Afterwards, while other users can also load at the terminal the comment of the program, checks other users to program convenient for user Comment, convenient for being exchanged with other users.
After the analysis of analysis module 2 gets programme information, loading module 3 calls in the server comments on the programme information Interface is commented on, is then loaded at the terminal, when not having the comment interface of the programme information in server, then loading module 3 calls Comment template in server, by corresponding position of the programme information load in comment template, to form commenting for the programme information Then public opinion-medium face loads at comment interface at the terminal.Comment on the voice port for inputting comment on interface with user, Yong Hutong Port input voice is crossed to comment on the programme information.
When user, which inputs voice, to be commented on program, conversion module 4 believes terminal by the voice of the port processing Breath is converted into text by speech analysis techniques, and then the text is uploaded to comment interface by uploading module 5, and such user can The program seen is commented at once with rapidly passing through voice, and does not influence to watch program.
In one embodiment, the uploading module 5 includes:
Mood unit is obtained, for obtaining the first mood of user according to the voice messaging;
Searching unit, for searching expression corresponding with first mood in expression library according to first mood;
Uploading unit, for the text and the expression to be uploaded to comment interface.
In the present embodiment, user when sending voice, according to program like and detest or to the role in program into When row comment, with the emotion of oneself, the mood of user can be extracted according to voice messaging by obtaining mood unit, that is, be used First mood of the family to program review.Mood includes a variety of, such as grieved, frightened, surprised, receiving, wild with joy, furious, vigilant, hatred Deng.The first mood that mood unit gets user according to voice messaging is obtained, then searching unit is corresponding by the first mood Expression loads out, selects for user, the corresponding text of voice messaging for the comment that then uploading unit delivers program user Word and the corresponding expression of voice messaging upload to comment interface and first load out text before upload, then load out expression, Above-mentioned text and expression is set to appear in comment interface jointly as a comment, more intuitively to give expression to programme information Comment, allow other users that can rapidly recognize comment of the user to this program.There are multiple expressions in expression library, Each expression has the label of a mood and uploads to expression after expression is added to a mood label by staff In expression library.The designated space of expression library storage in the server.People, can be faster using speech message when sending message; And people are when reading message, it is most fast to read pictorial information, followed by text information, is voice messaging again.This programme is by language Message breath is converted into text information and pictorial information, and other users is facilitated rapidly to understand comment of the user to the program.
In one embodiment, above-mentioned acquisition mood unit includes:
Subelement is extracted, for extracting the mood word in text;
Subelement is obtained, for the corresponding relationship of mood word and mood being called, obtaining user's according to the mood word First mood.
In the present embodiment, above-mentioned text refers to the text for being converted to voice messaging.Mood word is stored in mood In dictionary.After voice messaging is converted to text by server, extracting subelement will be identical as word in mood dictionary in text Word extract, that is, extract the mood word in text.Then obtaining subelement calls mood word corresponding with mood Relationship gets the first mood of user.Mood word is that staff sorts out, and gathers, then by mood Word is mapped with mood, and a mood can correspond to multiple mood words.In one embodiment, mood word and feelings The corresponding relationship of thread are as follows:
Mood word Mood
It is tired, gloomy, nauseous It is angry
Cock's crow, it gloomy, dancing, sings Happily
Terror is trembled Fear
It cries, tears, anxiety It is sad
In one embodiment, above-mentioned acquisition mood unit includes:
Model subelement, for the voice messaging to be input in preset voice mood identification model, described in output Corresponding first mood of voice messaging.
In the present embodiment, when user speaks with different moods, the corresponding signal of corresponding voice messaging is not identical 's.Energy feature, pronunciation frame number feature, fundamental frequency when user says two sections, mood is identical, in two sections of voice messagings Feature, formant feature, harmonic to noise ratio feature and mel cepstrum coefficients feature etc. have apparent general character.Model subelement Voice messaging is input in voice mood identification model, is calculated by voice mood identification model, output obtains above-mentioned voice Corresponding first mood of information.
In one embodiment, above-mentioned video information carrys out the video information that the camera of self terminal is shot, above-mentioned Analysis module 2 includes:
Extraction unit, the program view in at least two frame pictures and server for extracting the video information Frequently;
First computing unit, for picture described at least two frames to be carried out phase with each frame in the program video respectively It calculates, obtains and the one-to-one at least two programs frame of picture described at least two frames and corresponding similarity value like degree;
Second computing unit calculates if being above preset similarity threshold at least two similarity values Whether picture described at least two frames and the time interval of corresponding at least two programs frame are identical;
Judging unit, if identical as the time interval of corresponding at least two programs frame for picture described at least two frames, Then determine that the program is the corresponding program of the video information.
In the present embodiment, terminal is the mobile phone of user, and user is when seeing TV, it is seen that a variety show is want comprehensive to this Skill program is commented on, and mobile phone is picked up, and is opened wherein for commenting on the function or application of program, the camera of mobile phone controls camera shooting Head is taken pictures or is recorded a video, and if it is taking pictures, then at least claps two, and the interval time for shooting two is at least 5 seconds;If it is Video recording then at least records 5 seconds videos, chooses the first frame picture and last frame picture of video recording.Extraction unit passes through hand Machine has extracted two frame pictures of video information.The first computing unit also accesses the service specified in the function or application simultaneously Device extracts a program video in server, and then the first computing unit will be all in this two frames picture and program video Frame picture calculates similarity, and the frame picture of each frame video information calculates primary similar to all frame pictures of program video Degree, chooses highest similarity value, obtains two similarity values, it is preset then to judge whether the two similarity values are above Similarity value illustrates that the programme information in server is possible to the video information that user is seeing be one if be above Sample.Then the second computing unit extracts the letter for calculating the image frame in the highest corresponding program video of two similarity values Breath, when then the second computing unit calculates first of two frame pictures in the highest program video of this similarity in programme information Between be spaced, while obtaining between the second time of the mobile phone shooting of user or two frame pictures in the video information of acquisition when taking pictures Every if first time interval is equal with the second time interval or gap is less than preset interval threshold, judging unit determines The video information that user sees is the program, on corresponding pop-up comment interface to the mobile phone of user, so that user comments on.
In one embodiment, it is above-mentioned based on speech analysis to the device of program review further include:
Acquisition module, the facial information of the camera acquisition user for controlling terminal;
It identifies mood module, for the facial information to be input in preset face Emotion identification model, exports institute State corresponding second mood of facial information;
Judgment module, for judging whether first mood and second mood are identical;
Instruction module, if identical with second mood for first mood, generation is by the text and described Expression uploads to the instruction at comment interface.
In the present embodiment, when user has different moods, muscle, the organ of face have different forms.Face The different shape of information equally also may determine that the mood of user, calculated through voice mood identification model for confirming herein Whether the mood in voice messaging is correct.The instruction of acquisition facial information is generated, then the camera of acquisition module controlling terminal Starting, the front camera starting of priority acccess control terminal, acquires the image of surrounding, and identifies that the maximum face in image is believed Breath, the i.e. facial information of user.Then facial information is input in face Emotion identification model by identification mood module, exports base In the second mood for the user that facial information is calculated.Second mood is compared by judgment module with the first mood again, is seen Whether the two is consistent, if unanimously, instruction module generates the finger that the text and the expression are uploaded to comment interface It enables.Then it controls and text and the corresponding expression of the first mood is uploaded into comment interface.
In one embodiment, above-mentioned loading module 3 includes:
Comment unit is obtained, for obtaining the comment type of user;
Loading unit, for load in programme information with the corresponding comment interface of comment type to terminal.
In the present embodiment, comment type can be divided into multiple types from different levels.In one embodiment, it comments on Type is divided into acute type and the non-saturating type of play thoroughly according to whether leaking information breath comprising play.After server analysis obtains programme information, adjust With programme information, obtains comment unit and find the corresponding comment interface of programme information, reload two kinds of comment types to terminal It is selected for user.Then after the selection for receiving user, the comment interface load for the comment type that loading unit selects user To terminal.User is when the comment in interface is commented in viewing, it is seen that the comment that user wishes to.It especially can be to avoid It does not like acute saturating user to be seen when watching and commenting on about acute saturating comment, gives the better experience effect of user.
In conclusion the voice of the application inputted based on speech analysis to the prediction meanss of program review, reception user Signal, and it is automatically converted into text, to be commented on, user's typewriting input is compared, the time of input comment is saved, does not influence to use Program is watched at family.When being converted into text, automatically according to the text in voice, expression corresponding with text is added, meanwhile, go back root Mood when voice is delivered according to user, it is automatic to add expression corresponding with mood, user is more intuitively expressed in this way to program Comment on emotion.The facial expression for also extracting user adds corresponding expression according to the facial expression of user.By multiple technologies, Get mood of the user when watching program, and corresponding expression added according to mood, more it is true rapidly to program into Row comment, and the experience of the viewing program of user is not influenced.
Referring to Fig. 3, a kind of computer equipment is also provided in the embodiment of the present application, which can be server, Its internal structure can be as shown in Figure 3.The computer equipment includes processor, the memory, network connected by system bus Interface and database.Wherein, the processor of the Computer Design is for providing calculating and control ability.The computer equipment is deposited Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program And database.The internal memory provides environment for the operation of operating system and computer program in non-volatile memory medium.It should The database of computer equipment is for storing the data such as video information, programme information.The network interface of the computer equipment is used for It is communicated with external terminal by network connection.It is a kind of based on speech analysis to realize when the computer program is executed by processor To the method for program review.
Above-mentioned processor execute it is above-mentioned based on speech analysis to the method for program review the step of: receive the view of terminal acquisition Frequency information;The video information is analyzed, the corresponding programme information of the video information is obtained;Load the critical circles in programme information To terminal, critical circles face includes the voice port that user inputs comment in face;The voice that user is inputted by the voice port Information is converted into text;The text is uploaded into comment interface.
In one embodiment, above-mentioned processor executes the described the step of text is uploaded to comment interface, comprising: The first mood of user is obtained according to the voice messaging;According to first mood, searched and described first in expression library The corresponding expression of mood;The text and the expression are uploaded into comment interface.
In one embodiment, above-mentioned processor executes first mood that user is obtained according to the voice messaging Step, comprising: extract the mood word in text;According to the mood word, the corresponding relationship of mood word and mood is called, Obtain the first mood of user.
In one embodiment, above-mentioned processor executes first mood that user is obtained according to the voice messaging Step, comprising: the voice messaging is input in preset voice mood identification model, it is corresponding to export the voice messaging First mood.
In one embodiment, above-mentioned video information carrys out the video information that the camera of self terminal is shot, above-mentioned The step of processor executes the analysis video information, obtains the video information corresponding programme information, comprising: extract A program video in at least two frame pictures and server of the video information;By picture described at least two frames respectively with Each frame in the program video carries out similarity calculation, obtains and picture one-to-one at least two described at least two frames Program frame and corresponding similarity value;If at least two similarity values are above preset similarity threshold, calculate Whether picture described at least two frames and the time interval of corresponding at least two programs frame are identical;If so, determining the program It is the corresponding program of the video information.
In one embodiment, above-mentioned processor executes described upload to the text and the expression and comments on interface The step of before, comprising: the facial information of the camera of controlling terminal acquisition user;The facial information is input to preset In face Emotion identification model, corresponding second mood of the facial information is exported;Judge first mood and described second Whether mood is identical;If so, generating the instruction that the text and the expression are uploaded to comment interface.
In one embodiment, above-mentioned processor execute it is described load in programme information comment interface to terminal step Suddenly, comprising: obtain the comment type of user;Load in programme information with the corresponding comment interface of comment type to terminal.
In conclusion the computer equipment of the application, receives the voice signal of user's input, and it is automatically converted into text, To be commented on, user's typewriting input is compared, saves the time of input comment, user is not influenced and watches program.It is written converting When word, automatically according to the text in voice, expression corresponding with text is added, meanwhile, feelings when voice are delivered also according to user Thread, it is automatic to add expression corresponding with mood, user is more intuitively expressed in this way to the comment emotion of program.Also extract user's Facial expression adds corresponding expression according to the facial expression of user.By multiple technologies, user is got when watching program Mood, and corresponding expression is added according to mood, more really rapidly program is commented on, and do not influence user's Watch the experience of program.
It will be understood by those skilled in the art that structure shown in Fig. 3, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme.
One embodiment of the application also provides a kind of computer readable storage medium, is stored thereon with computer program, calculates Machine program realizes a kind of method based on speech analysis to program review when being executed by processor, specifically: receive terminal acquisition Video information;The video information is analyzed, the corresponding programme information of the video information is obtained;Commenting in load programme information To terminal, critical circles face includes the voice port that user inputs comment in public opinion-medium face;User is inputted by the voice port Voice messaging is converted into text;The text is uploaded into comment interface.
In one embodiment, above-mentioned processor executes the described the step of text is uploaded to comment interface, comprising: The first mood of user is obtained according to the voice messaging;According to first mood, searched and described first in expression library The corresponding expression of mood;The text and the expression are uploaded into comment interface.
In one embodiment, above-mentioned processor executes first mood that user is obtained according to the voice messaging Step, comprising: extract the mood word in text;According to the mood word, the corresponding relationship of mood word and mood is called, Obtain the first mood of user.
In one embodiment, above-mentioned processor executes first mood that user is obtained according to the voice messaging Step, comprising: the voice messaging is input in preset voice mood identification model, it is corresponding to export the voice messaging First mood.
In one embodiment, above-mentioned video information carrys out the video information that the camera of self terminal is shot, above-mentioned The step of processor executes the analysis video information, obtains the video information corresponding programme information, comprising: extract A program video in at least two frame pictures and server of the video information;By picture described at least two frames respectively with Each frame in the program video carries out similarity calculation, obtains and picture one-to-one at least two described at least two frames Program frame and corresponding similarity value;If at least two similarity values are above preset similarity threshold, calculate Whether picture described at least two frames and the time interval of corresponding at least two programs frame are identical;If so, determining the program It is the corresponding program of the video information.
In one embodiment, above-mentioned processor executes described upload to the text and the expression and comments on interface The step of before, comprising: the facial information of the camera of controlling terminal acquisition user;The facial information is input to preset In face Emotion identification model, corresponding second mood of the facial information is exported;Judge first mood and described second Whether mood is identical;If so, generating the instruction that the text and the expression are uploaded to comment interface.
In one embodiment, above-mentioned processor execute it is described load in programme information comment interface to terminal step Suddenly, comprising: obtain the comment type of user;Load in programme information with the corresponding comment interface of comment type to terminal.
In conclusion the computer readable storage medium of the application, receives the voice signal of user's input, and automatic conversion At text, to be commented on, user's typewriting input is compared, saves the time of input comment, not influenced user and watch program.Turning When changing text into, automatically according to the text in voice, expression corresponding with text is added, meanwhile, voice is delivered also according to user When mood, automatic addition expression corresponding with mood is more intuitive in this way to express user to the comment emotion of program.Also extract The facial expression of user adds corresponding expression according to the facial expression of user.By multiple technologies, gets user and watching Mood when program, and corresponding expression is added according to mood, more really rapidly program is commented on, and does not influence The experience of the viewing program of user.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, Any reference used in provided herein and embodiment to memory, storage, database or other media, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double speed are according to rate SDRAM (SSRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, device, article or the method that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, device, article or method institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, device of element, article or method.
The foregoing is merely preferred embodiment of the present application, are not intended to limit the scope of the patents of the application, all utilizations Equivalent structure or equivalent flow shift made by present specification and accompanying drawing content is applied directly or indirectly in other correlations Technical field, similarly include in the scope of patent protection of the application.

Claims (10)

1. it is a kind of based on speech analysis to the method for program review characterized by comprising
Receive the video information of terminal acquisition;
The video information is analyzed, the corresponding programme information of the video information is obtained;
The comment interface in programme information is loaded to terminal, critical circles face includes the voice port that user inputs comment;
User is converted into text by the voice messaging that the voice port inputs;
The text is uploaded into comment interface.
2. as described in claim 1 based on speech analysis to the method for program review, which is characterized in that described by the text The step of uploading to comment interface, comprising:
The first mood of user is obtained according to the voice messaging;
According to first mood, expression corresponding with first mood is searched in expression library;
The text and the expression are uploaded into comment interface.
3. as claimed in claim 2 based on speech analysis to the method for program review, which is characterized in that described according to institute's predicate The step of the first mood of sound acquisition of information user, comprising:
Extract the mood word in text;
According to the mood word, the corresponding relationship of mood word and mood is called, the first mood of user is obtained.
4. as claimed in claim 2 based on speech analysis to the method for program review, which is characterized in that described according to institute's predicate The step of the first mood of sound acquisition of information user, comprising:
The voice messaging is input in preset voice mood identification model, corresponding first feelings of the voice messaging are exported Thread.
5. as described in claim 1 based on speech analysis to the method for program review, which is characterized in that the video information is Carry out the video information that the camera of self terminal is shot, it is corresponding to obtain the video information for the analysis video information Programme information the step of, comprising:
Extract a program video in at least two frame pictures and server of the video information;
Picture described at least two frames is subjected to similarity calculation with each frame in the program video respectively, is obtained and at least two The one-to-one at least two programs frame of picture described in frame and corresponding similarity value;
If at least two similarity values are above preset similarity threshold, calculate picture described at least two frames with it is corresponding At least two program frames time interval it is whether identical;
If so, determining that the program is the corresponding program of the video information.
6. as claimed in claim 2 based on speech analysis to the method for program review, which is characterized in that described by the text And the expression uploaded to before the step of comment interface, comprising:
The facial information of the camera acquisition user of controlling terminal;
The facial information is input in preset face Emotion identification model, corresponding second feelings of the facial information are exported Thread;
Judge whether first mood and second mood are identical;
If so, generating the instruction that the text and the expression are uploaded to comment interface.
7. as described in claim 1 based on speech analysis to the method for program review, which is characterized in that the load program letter The step of comment interface in breath is to terminal, comprising:
Obtain the comment type of user;
Load in programme information with the corresponding comment interface of comment type to terminal.
8. it is a kind of based on speech analysis to the device of program review characterized by comprising
Receiving module, for receiving the video information of terminal acquisition;
Analysis module obtains the corresponding programme information of the video information for analyzing the video information
Loading module, for loading the comment interface in programme information to terminal, critical circles face includes the language that user inputs comment Sound port;
Conversion module, for user to be converted into text by the voice messaging that the voice port inputs;
Uploading module, for the text to be uploaded to comment interface.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
CN201910651425.1A 2019-07-18 2019-07-18 Based on speech analysis to the method, apparatus and computer equipment of program review Pending CN110460903A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910651425.1A CN110460903A (en) 2019-07-18 2019-07-18 Based on speech analysis to the method, apparatus and computer equipment of program review
PCT/CN2019/116702 WO2021008025A1 (en) 2019-07-18 2019-11-08 Speech recognition-based information analysis method and apparatus, and computer device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910651425.1A CN110460903A (en) 2019-07-18 2019-07-18 Based on speech analysis to the method, apparatus and computer equipment of program review

Publications (1)

Publication Number Publication Date
CN110460903A true CN110460903A (en) 2019-11-15

Family

ID=68481411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910651425.1A Pending CN110460903A (en) 2019-07-18 2019-07-18 Based on speech analysis to the method, apparatus and computer equipment of program review

Country Status (2)

Country Link
CN (1) CN110460903A (en)
WO (1) WO2021008025A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114760257A (en) * 2021-01-08 2022-07-15 上海博泰悦臻网络技术服务有限公司 Commenting method, electronic device and computer readable storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117153151B (en) * 2023-10-09 2024-05-07 广州易风健康科技股份有限公司 Emotion recognition method based on user intonation

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103984778A (en) * 2014-06-06 2014-08-13 北京金山网络科技有限公司 Video retrieval method and video retrieval system
CN104113787A (en) * 2014-05-29 2014-10-22 腾讯科技(深圳)有限公司 Program-based commenting method, terminal, server, and program-based commenting system
CN105228013A (en) * 2015-09-28 2016-01-06 百度在线网络技术(北京)有限公司 Barrage information processing method, device and barrage video player
CN106570496A (en) * 2016-11-22 2017-04-19 上海智臻智能网络科技股份有限公司 Emotion recognition method and device and intelligent interaction method and device
CN107209876A (en) * 2014-11-20 2017-09-26 阿托姆票务有限责任公司 Cooperate with ticketing system
CN108322832A (en) * 2018-01-22 2018-07-24 广州市动景计算机科技有限公司 Comment on method, apparatus and electronic equipment
CN108471541A (en) * 2018-02-01 2018-08-31 北京奇艺世纪科技有限公司 A kind of method and device that video barrage is shown
CN109254669A (en) * 2017-07-12 2019-01-22 腾讯科技(深圳)有限公司 A kind of expression picture input method, device, electronic equipment and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008234431A (en) * 2007-03-22 2008-10-02 Toshiba Corp Comment accumulation device, comment creation browsing device, comment browsing system, and program
US20100251094A1 (en) * 2009-03-27 2010-09-30 Nokia Corporation Method and apparatus for providing comments during content rendering
KR101577540B1 (en) * 2014-07-21 2015-12-17 연세대학교 산학협력단 Sharing reply providing method for feeling share
CN104407834A (en) * 2014-11-13 2015-03-11 腾讯科技(成都)有限公司 Message input method and device
CN107577759B (en) * 2017-09-01 2021-07-30 安徽广播电视大学 Automatic recommendation method for user comments

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104113787A (en) * 2014-05-29 2014-10-22 腾讯科技(深圳)有限公司 Program-based commenting method, terminal, server, and program-based commenting system
CN103984778A (en) * 2014-06-06 2014-08-13 北京金山网络科技有限公司 Video retrieval method and video retrieval system
CN107209876A (en) * 2014-11-20 2017-09-26 阿托姆票务有限责任公司 Cooperate with ticketing system
CN105228013A (en) * 2015-09-28 2016-01-06 百度在线网络技术(北京)有限公司 Barrage information processing method, device and barrage video player
CN106570496A (en) * 2016-11-22 2017-04-19 上海智臻智能网络科技股份有限公司 Emotion recognition method and device and intelligent interaction method and device
CN109254669A (en) * 2017-07-12 2019-01-22 腾讯科技(深圳)有限公司 A kind of expression picture input method, device, electronic equipment and system
CN108322832A (en) * 2018-01-22 2018-07-24 广州市动景计算机科技有限公司 Comment on method, apparatus and electronic equipment
CN108471541A (en) * 2018-02-01 2018-08-31 北京奇艺世纪科技有限公司 A kind of method and device that video barrage is shown

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114760257A (en) * 2021-01-08 2022-07-15 上海博泰悦臻网络技术服务有限公司 Commenting method, electronic device and computer readable storage medium

Also Published As

Publication number Publication date
WO2021008025A1 (en) 2021-01-21

Similar Documents

Publication Publication Date Title
CN110444198B (en) Retrieval method, retrieval device, computer equipment and storage medium
US20210352380A1 (en) Characterizing content for audio-video dubbing and other transformations
CN109033257A (en) Talk about art recommended method, device, computer equipment and storage medium
CN110557659B (en) Video recommendation method and device, server and storage medium
CN109862397B (en) Video analysis method, device, equipment and storage medium
US20210369042A1 (en) Natural conversation storytelling system
CN105512348A (en) Method and device for processing videos and related audios and retrieving method and device
CN109543007A (en) Put question to data creation method, device, computer equipment and storage medium
US20140172419A1 (en) System and method for generating personalized tag recommendations for tagging audio content
KR20160104635A (en) Methods, systems, and media for generating search results based on contextual information
CN110136721A (en) A kind of scoring generation method, device, storage medium and electronic equipment
US11790271B2 (en) Automated evaluation of acting performance using cloud services
CN114245203B (en) Video editing method, device, equipment and medium based on script
US11107465B2 (en) Natural conversation storytelling system
CN110505504B (en) Video program processing method and device, computer equipment and storage medium
CN112423133B (en) Video switching method and device, computer readable storage medium and computer equipment
CN110460903A (en) Based on speech analysis to the method, apparatus and computer equipment of program review
CN112041809A (en) Automatic addition of sound effects to audio files
CN110769312A (en) Method and device for recommending information in live broadcast application
CN114155860A (en) Abstract recording method and device, computer equipment and storage medium
Lee et al. Implementation of robot journalism by programming custombot using tokenization and custom tagging
US20210279427A1 (en) Systems and methods for generating multi-language media content with automatic selection of matching voices
US20230326369A1 (en) Method and apparatus for generating sign language video, computer device, and storage medium
US20210337274A1 (en) Artificial intelligence apparatus and method for providing visual information
CN113301352B (en) Automatic chat during video playback

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191115

RJ01 Rejection of invention patent application after publication