CN100596186C - An interactive digital multimedia making method based on video and audio - Google Patents

An interactive digital multimedia making method based on video and audio Download PDF

Info

Publication number
CN100596186C
CN100596186C CN200610081465A CN200610081465A CN100596186C CN 100596186 C CN100596186 C CN 100596186C CN 200610081465 A CN200610081465 A CN 200610081465A CN 200610081465 A CN200610081465 A CN 200610081465A CN 100596186 C CN100596186 C CN 100596186C
Authority
CN
China
Prior art keywords
video
information
audio
image
audio frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN200610081465A
Other languages
Chinese (zh)
Other versions
CN101079996A (en
Inventor
侯启槟
王阳生
曾祥永
鲁鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Interjoy Technology Limited
Original Assignee
BEIJING INTERJOY TECHNOLOGY Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING INTERJOY TECHNOLOGY Ltd filed Critical BEIJING INTERJOY TECHNOLOGY Ltd
Priority to CN200610081465A priority Critical patent/CN100596186C/en
Publication of CN101079996A publication Critical patent/CN101079996A/en
Application granted granted Critical
Publication of CN100596186C publication Critical patent/CN100596186C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

This invention discloses a process method for interacted digital multimedia based on audio and video including: 1, getting site video images timely and pre-processing them to get initial video information, 2, processing the initial video information to video control information, 3, getting audio data in site timely and pre-processing them to get initial audio information, 4, converting the initialaudio information to audio control information, in which, steps 1 and 2 form a step set 1, steps 3 and 4 to a set 2, and the sets 1 and 2 are independent to each other and enter into step 5 after finishing the steps, 5, combining the video and audio information, changing body content and outputting then finishing.

Description

A kind of interactive digital multimedia making method based on video and audio frequency
Technical field
The present invention relates to a kind of computer man-machine interacting technical method, particularly a kind of interactive digital multimedia making method based on video and audio frequency.
Background technology
In recent years, along with the innovation of information technology, the extensive utilization of multimedia technology, the develop rapidly of communication medium industry, the intention and the form of all kinds of media releasing (as advertisement) also emerge in an endless stream, and be rich and varied.But traditional media releasing intention and form are in a single day fixing, all have the drawback of consistency, one-way, repeatability.Though development along with computer vision and speech recognition technology, use vision and voice technology and carry out the man-machine possibility that become simply alternately of nature, but how to make audient and all kinds of media releasing carry out contactless interaction, make media releasing incorporate the motion and the acoustic information of scenery around audient self and the place as far as possible, and can make the media releasing content produce different variations by this different interaction, the interactivity and the interest of audient and issue when improving media releasing become a challenge that faces when making relevant multimedia file.
Summary of the invention
The technical problem to be solved in the present invention provides a kind of interactive digital multimedia making method based on video and audio frequency, makes multimedia file by man-machine interaction.
For solving the problems of the technologies described above, the present invention includes following steps: beginning; Step 1, obtain the live video image in real time and carry out preliminary treatment, obtain preliminary video information by digit optical equipment; Step 2, step 1 is obtained preliminary video information process be converted into the video control information; Step 3, obtain the live audio data in real time, and carry out preliminary treatment, obtain preliminary audio-frequency information by digital audio-frequency apparatus; Step 4, step 3 is obtained preliminary audio-frequency information handle and be converted into the audio frequency control information; Wherein step 1 and step 2 are formed the step group of carrying out in order, step 3 and step 4 are formed the step group of carrying out in order two, step group one, step group two are independently of one another, can carry out simultaneously, can not carry out simultaneously yet, and no matter whether step group one, step group two are carried out simultaneously, all enter step 5 after executing; Video control information and audio frequency control information are merged in step 5, processing, and export the control command to body, drive body by control interface by described control command, and change body content is also exported, and wherein body refers to multimedia file; Finish.
The present invention is owing to adopt the interactive control of video and audio frequency, and the result is converted into the control command of multimedia file, and realization is to the direct control of virtual element in the multimedia file.
Description of drawings
Fig. 1 is that the inventive method is applied to the flow chart in the ad production;
Fig. 2 merges output to control information among Fig. 1, promptly the analysis of video and audio frequency and recognition result is mapped to flow chart in the correspondent advertisement control;
Embodiment
The present invention is further detailed explanation below in conjunction with drawings and the specific embodiments.
The inventive method can be divided on the principle based on the interactive digital multimedia making method of audio frequency with based on the interactive digital multimedia making method of video.
Wherein the interactive digital multimedia making method based on video comprises the steps:
1. by camera equipment, capture video images and carry out light and proofread and correct, remove preliminary treatment such as make an uproar in real time;
2. utilize the variation and the feature of video image on time and space that it is cut apart, image after will cutting apart carries out feature extraction, analysis, tries to achieve global motion information and local human body attitude information (position, direction, amplitude and the basic configuration parameter that they form) in the image; Handle by regularization,, be converted into the control command of advertisement these information;
3. control interface drives advertisement according to control command.
Interactive digital multimedia making method based on audio frequency comprises the steps:
1. will be from microphone and sound card equipment, gather voice data in real time and remove preliminary treatment such as make an uproar;
2. adopt tone analysis and speech recognition technology that the audio frequency of gathering is handled, obtain frequency values, range value and the corresponding meaning of one's words vocabulary recognition result of sound, be converted into the control command of advertisement;
3. control interface drives advertisement according to control command.
Must emphasize that above-mentioned two kinds of methods can independently use, also can be used in combination.
The embodiment that applies to ad production with this method further sets forth the present invention below.Fig. 1 is the flow chart of this embodiment, and wherein step (1-5) and step (6-10) can be used separately, and application also can walk abreast.
As shown in Figure 1, these embodiment concrete steps are as follows:
(1) obtains video image: obtain realtime graphic from the camera that is connected to computer by the high speed image trapping module.Owing to will handle each two field picture, so will from video flowing, image be extracted frame by frame.According to the difference of application purpose, the angle of camera can be people and the scenery in the place, also can take people and scenery from the top, place;
(2) remove to make an uproar and wait preliminary treatment: be to improve precision and the speed that subsequent motion information extraction and attitude information extract, need remove preliminary treatment such as make an uproar the two field picture of step (1) acquisition.At first, improve computational speed, the color image resolution of being gathered is reduced into original 1/4, and be converted to the gray level image of 256 grades in order to reduce operand.Secondly, ask space (in the frame) and time (interframe) to go up corresponding pixel mean value, every two field picture is carried out smoothing processing, remove the random noise that gatherer process causes.In addition, brightness is compensated to eliminate the influence of illumination variation.The pixel value that is about to each picture element deducts the average of entire image pixel value, divided by the variance of entire image pixel value, multiply by a coefficient more then.Through above-mentioned processing, thereby eliminate the influence that light changes to a certain extent;
(3) extraction of motion information:, need from the image after step (2) is handled, extract the movable information of the overall situation for the subsequent extracted attitude information.At first, present frame is done additive operation with each corresponding pixel of former frame image, and ask the absolute value that subtracts each other the result, obtain the frame-to-frame differences image of descriptor frame differences information; Then, the frame-to-frame differences image is carried out threshold process, judge each pixel be more than or equal to or less than certain fixing threshold value, obtain describing the bianry image (0 expression less than, 1 expression more than or equal to) of moving region; At last, the bianry image to aforementioned acquisition carries out edge extracting, the edge of acquisition moving region.In addition, for certain fixed area, can try to achieve amplitude, direction and the speed parameter of this regional movement according to 1 shared ratio, position of centre of gravity and historical information in this zone;
(4) attitude information extracts: according to step (3) extraction of motion information result, further sport foreground is split, respectively zones of different is carried out feature separately, analyze the change procedure of the shape and the shape of the edge contour in the specific region in the aforementioned bianry image, extract feature with rotation convergent-divergent consistency, draw corresponding attitude information, and followed the tracks of verification and prediction by the result in a last moment;
(5) the video Control Parameter is extracted and transformed: global motion information and local human body attitude information that step (3) and (4) are extracted change into corresponding control information;
(6) obtain voice data: gather realaudio data by microphone, sound card;
(7) remove preliminary treatment such as make an uproar:, remove by smoothing processing and to make an uproar for the audio frequency of real-time collection;
(8) tone information extracts: the audio frequency for removing after making an uproar, carry out tone analysis, and extract frequency values, the range value of sound;
(8) limited vocabulary speech recognition: adopt unspecified person, continuous speech recognition method, discern some discrete and less demanding limited vocabulary order of real-time,, stop etc. as beginning;
(9) the audio frequency Control Parameter is extracted and is transformed: the tone information and the limited vocabulary recognition result that are extracted are changed into corresponding control information;
(10) order realizes: the result that will discern at last, shine upon conversion by the command set that pre-defines, and obtain the control information of advertisement;
(11) multichannel merges: with the control information combination of video and audio frequency, form efficient comprehensively advertisement control command.
Describe above-mentioned steps 11 below in detail, soon the analysis of video and audio frequency and recognition result are mapped to the process in the correspondent advertisement control, and as shown in Figure 2, basic step is as follows:
(1) at first the ad content control command is classified: according to video have fast, intuitively, continuously output but be subject to characteristics of interference, and sound has the quick but not high characteristics of identifying instantaneity of nature required command set is effectively classified.
(2) based on the control of video: the corresponding relation that at first needs to set various movable informations and various human body attitude and advertisement controlled quentity controlled variable, then by scenery and audient colony around the camera collection place, attitude for motion in the image and human body, carry out real-time analysis and identification, according to current state, adopt certain predicting tracing algorithm, output control corresponding amount;
(3) based on the control of audio frequency, at first need to set up crucial dictionary, and the mapping table of keyword and related command, gather the voice signal of scenery around audient colony and the place then by microphone, according to tone analysis and voice identification result, produce control commands corresponding;
(4) by the advertisement control interface, with the order of video and audio frequency, real-time integration is in the virtual element and content control of advertisement, and perhaps direct adjustment model reaches the purpose of control.
In sum, the inventive method adopts the interactive control of video and audio frequency, exactly the motion and the sound of scenery around audient and the place are analyzed in computer and discerned, and the result is converted into control command to multimedia file, realize direct control virtual element in the multimedia.

Claims (4)

1, a kind of interactive digital multimedia making method based on video and audio frequency is characterized in that, comprises the steps: beginning; Step 1, obtain the live video image in real time and carry out preliminary treatment, obtain preliminary video information by digit optical equipment; Step 2, step 1 is obtained preliminary video information, by video image is cut apart by its variation and feature on time and space, again the image after cutting apart is extracted and cut apart feature, extract global motion information and local human body attitude information, be converted into the video control information; Step 3, obtain the live audio data in real time, and carry out preliminary treatment, obtain preliminary audio-frequency information by digital audio-frequency apparatus; Step 4, step 3 is obtained preliminary audio-frequency information,, carry out the limited vocabulary speech recognition, be converted into the audio frequency control information by extracting frequency values, the range value of sound; Wherein step 1 and step 2 are formed the step group of carrying out in order, step 3 and step 4 are formed the step group of carrying out in order two, described step group one, step group two are independently of one another, can carry out simultaneously, can not carry out simultaneously yet, and no matter whether step group one, step group two are carried out simultaneously, all enter step 5 after executing; Described video control information and audio frequency control information are merged in step 5, processing, and export the control command to body, drive body by control interface by described control command, and change body content is also exported, and wherein said body refers to multimedia file; Finish.
2, the interactive digital multimedia making method based on video and audio frequency according to claim 1 is characterized in that, described multimedia file is the multimedia file that is used for image display or advertisement; Described digit optical equipment is Digital Video; Described digital audio-frequency apparatus is microphone and sound card.
3, the interactive digital multimedia making method based on video and audio frequency according to claim 2 is characterized in that, preliminary treatment described in the step 1 comprises that described live video image is carried out light to be proofreaied and correct, remove and make an uproar; Local human body attitude information described in the step 2 comprises the basic configuration parameter that position of human body, direction, amplitude and human body are formed; Preliminary treatment described in the step 3 comprises adopts tone analysis and speech recognition technology to handle the live audio data; Processing described video control information of fusion and audio frequency control information relate in the described step 5: the command set pretreatment module, video control transformation module and audio frequency control transformation module, wherein the command set pretreatment module is classified to the video/audio command set, and press described video/audio control information and the audio frequency control information accepted and give video control transformation module and audio frequency control transformation module with corresponding command mapping respectively, video control transformation module accept the order of described video control information and aforementioned command set pretreatment module mapping and output to the video control command of body to control interface, audio frequency control transformation module accept the order of described audio frequency control information and the mapping of aforementioned command set pretreatment module and output to the audio frequency control command of body to control interface.
4, the interactive digital multimedia making method based on video and audio frequency according to claim 3, it is characterized in that, described live video image removes makes an uproar, and comprising: at first the live video image resolution ratio is reduced into originally 1/4, and is converted to the gray level image of 256 grades; Ask in the frame then and interframe on corresponding pixel mean value, every two field picture is carried out smoothing processing, remove the random noise that gatherer process causes; Described live video image carries out light and proofreaies and correct and to refer to: the pixel value of each picture element is deducted the average of entire image pixel value, divided by the variance of entire image pixel value, and then multiply by a coefficient; Described extraction global motion information comprises: at first present frame is done additive operation with each corresponding pixel of former frame image, and ask the absolute value that subtracts each other the result, obtain the frame-to-frame differences image of descriptor frame differences information; Then the frame-to-frame differences image is carried out threshold process, judge each pixel be more than or equal to or less than certain fixing threshold value, obtain describing the bianry image of moving region, with 0 represent less than, 1 represent more than or equal to; Bianry image to aforementioned acquisition carries out edge extracting at last, obtains the edge of moving region; The local human body attitude information of described extraction refers to: according to aforementioned extraction global motion information result, further cut apart sport foreground, zones of different is carried out signature analysis separately, analyze the change procedure of the shape and the shape of the edge contour in the specific region in the aforementioned bianry image, extract feature with rotation convergent-divergent consistency, draw corresponding attitude information, and followed the tracks of verification and prediction by the result in a last moment.
CN200610081465A 2006-05-22 2006-05-22 An interactive digital multimedia making method based on video and audio Expired - Fee Related CN100596186C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200610081465A CN100596186C (en) 2006-05-22 2006-05-22 An interactive digital multimedia making method based on video and audio

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200610081465A CN100596186C (en) 2006-05-22 2006-05-22 An interactive digital multimedia making method based on video and audio

Publications (2)

Publication Number Publication Date
CN101079996A CN101079996A (en) 2007-11-28
CN100596186C true CN100596186C (en) 2010-03-24

Family

ID=38907185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200610081465A Expired - Fee Related CN100596186C (en) 2006-05-22 2006-05-22 An interactive digital multimedia making method based on video and audio

Country Status (1)

Country Link
CN (1) CN100596186C (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120130822A1 (en) * 2010-11-19 2012-05-24 Microsoft Corporation Computing cost per interaction for interactive advertising sessions
CN103186226A (en) * 2011-12-28 2013-07-03 北京德信互动网络技术有限公司 Man-machine interaction system and method
CN103186227A (en) * 2011-12-28 2013-07-03 北京德信互动网络技术有限公司 Man-machine interaction system and method
CN103905926A (en) * 2014-04-14 2014-07-02 夷希数码科技(上海)有限公司 Method and device for playing outdoor advertisement
CN105407316B (en) * 2014-08-19 2019-05-31 北京奇虎科技有限公司 Implementation method, intelligent camera system and the IP Camera of intelligent camera system
CN104571516B (en) * 2014-12-31 2018-01-05 武汉百景互动科技有限责任公司 Interactive advertisement system
CN107197327B (en) * 2017-06-26 2020-11-13 广州天翌云信息科技有限公司 Digital media manufacturing method
CN109308625A (en) * 2017-07-27 2019-02-05 掌游天下(北京)信息技术股份有限公司 A kind of production method for playing advertisement, system and corresponding storage medium
CN110349576A (en) * 2019-05-16 2019-10-18 国网上海市电力公司 Power system operation instruction executing method, apparatus and system based on speech recognition
CN112348926A (en) * 2020-11-23 2021-02-09 杭州美册科技有限公司 Android-based video splicing app processing method and device

Also Published As

Publication number Publication date
CN101079996A (en) 2007-11-28

Similar Documents

Publication Publication Date Title
CN100596186C (en) An interactive digital multimedia making method based on video and audio
WO2021104110A1 (en) Voice matching method and related device
CN100345085C (en) Method for controlling electronic game scene and role based on poses and voices of player
CN102324019B (en) Method and system for automatically extracting gesture candidate region in video sequence
CN112001992A (en) Voice-driven 3D virtual human expression sound-picture synchronization method and system based on deep learning
CN107092349A (en) A kind of sign Language Recognition and method based on RealSense
CN114422825A (en) Audio and video synchronization method, device, medium, equipment and program product
Yargıç et al. A lip reading application on MS Kinect camera
CN108073875A (en) A kind of band noisy speech identifying system and method based on monocular cam
CN1731833A (en) Method for composing audio/video file by voice driving head image
WO2021203880A1 (en) Speech enhancement method, neural network training method, and related device
Zhao et al. Real-time sign language recognition based on video stream
Ivanko et al. RUSAVIC Corpus: Russian audio-visual speech in cars
CN201289739Y (en) Remote control video player capable of automatically recognizing expression
CN115511994A (en) Method for quickly cloning real person into two-dimensional virtual digital person
WO2018113649A1 (en) Virtual reality language interaction system and method
CN116934926B (en) Recognition method and system based on multi-mode data fusion
CN111368800B (en) Gesture recognition method and device
CN112200008A (en) Face attribute recognition method in community monitoring scene
CN116758451A (en) Audio-visual emotion recognition method and system based on multi-scale and global cross attention
CN116580720A (en) Speaker vision activation interpretation method and system based on audio-visual voice separation
CN111009262A (en) Voice gender identification method and system
CN107545888A (en) A kind of pharyngeal cavity electronic larynx voice communication system automatically adjusted and method
CN113254713B (en) Multi-source emotion calculation system and method for generating emotion curve based on video content
CN114882590A (en) Lip reading method based on multi-granularity space-time feature perception of event camera

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee

Owner name: BEIJING SHENGKAI INTERACTIVE TECHNOLOGY CO., LTD.

Free format text: FORMER NAME: BEIJING SHENGKAI INTERACTIVE ENTERTAINMENT TECHNOLOGY CO., LTD.

CP01 Change in the name or title of a patent holder

Address after: 100080, Beijing, Zhichun Road, Haidian District, No. 63 satellite building, 9 floor

Patentee after: Beijing Interjoy Technology Limited

Address before: 100080, Beijing, Zhichun Road, Haidian District, No. 63 satellite building, 9 floor

Patentee before: Beijing Interjoy Technology Limited

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100324

Termination date: 20180522