【作者单位】1National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu, China;2College of Computer Science, Sichuan University, Chengdu, China;3School of Information Science and Technology, Tibet University, Lhasa, China
【召开年】2022
【会议地点】Padua, Italy
【摘要】 Visual and auditory modalities both contain a large amount of rich information about audio-visual events. While the human perception system can effectively fuse the information of the dual modalities in recognizing events, it is still an open issue h...