香港科技大学博士后,华为诺亚方舟实验室访问学者,宋阳秋博士访问实验室并作主题演讲

演讲题目:Collaborative Boosting for Activity Classification in Microblogs

演讲人: 宋阳秋 ph.D. post-doc researcher in HKUST and visiting researcher at Huawei Noah's Ark Lab

时间:2013年9月20日,星期五,上午10点

地点:复旦大学张江校区软件楼102第二会议室

联系人:肖仰华(shawyh@fudan.edu.cn)

Abstract: Users’ daily activities, such as dining and shopping, inherently reflect their habits, intents and preferences, thus provide invaluable information for services such as personalized information recommendation and targeted advertising. Users’ activity information, although ubiquitous on social media, has largely been unexploited. We address the task of user activity classification in microblogs, where users can publish short messages and maintain social networks online. We identify the importance of modeling a user’s individuality, and that of exploiting opinions of the user’s friends for accurate activity classification. In this light, we propose a novel collaborative boosting framework comprising a text-to-activity classifier for each user, and a mechanism for collaboration between classifiers of users having social connections. The collaboration between two classifiers includes exchanging their own training instances and their dynamically changing labeling decisions. We propose an iterative learning procedure that is formulated as gradient descent in learning function space, while opinion exchange between classifiers is implemented with a weighted voting in each learning iteration. We show through experiments that on real-world data from Sina Weibo, our method outperforms existing off-the-shelf algorithms that do not take users’ individuality or social connections into account.

BIO: Dr Yangqiu Song is a post-doc researcher in HKUST and visiting researcher at Huawei Noah's Ark Lab, Hong Kong, since October 2012. Before that, he was an associate researcher in Microsoft Research Asia (2010-2012) and staff researcher in IBM Research-China (2009-2010) respectively. He received his B.E. and PHD degree from Tsinghua University, China, in July 2003 and January 2009. He also worked as interns in IBM in 2006-2007 and in Google in 2007-2008. His current research focuses on using machine learning and data mining to extract and infer deep knowledge from big data, including the techniques of large scale learning algorithms, lifelong (evolutionary, never ending) machine learning mechanisms, natural language understanding, and knowledge engineering. The knowledge helps users better enjoy daily living and social activities, or helps data scientist do better business analytics. Dr Yangqiu Song has been IEEE member and ACM member for several years. Most of his publications fall in these two communities. He has also been served for many top-tier conferences as program committee (PC) members and for journals as reviewers, e.g., PC members for KDD’13, IJCAI’13, WSDM’13, RecSys’13, ACL’13, IUI’13, ACML’13, ICTAI’13, and ICMLA’07-13, and journal reviewers for IEEE Trans. on Pattern Analysis and Machine Intelligence, IEEE Transactions on Neural Networks, Pattern Recognition, IEEE Transactions on Knowledge and Data Engineering, Data Mining and Knowledge Discovery, IEEE Transactions on Systems, Man and Cybernetics - Part B, etc.

Post at:2013-9-20


美国弗罗里达大学助理教授,Daisy Zhe Wang博士访问实验室并作大规模知识图谱构建相关主题演讲

演讲题目:Large probabilistic Knowledge Base Systems

演讲人: Daisy Zhe Wang ph.D. Computer and Information Science and Engineering University of Florida

时间:2013年8月5日,星期一,上午10点

地点:复旦大学张江校区软件楼102第二会议室

联系人:肖仰华(shawyh@fudan.edu.cn)

Abstract: Keyword search engines have been the state-of-the-art information retrieval tool over the large text corpora for two decades. To data, most search engines have little understanding that keywords and documents refer to entities and relations in real-life. Better search results and experience can be achieved by understanding entities and relations in documents as well as in queries. A knowledge base(KB) containing relevant entities and relations should be the backbone of any application that is fulfilled by text. Given a large number of text data, a system is needed that can automatically construct a knowledge base using statistical machine learning (SML) methods manage the uncertainty inherit in the extracted knowledge, and maintain them over time. In this talk, I first summarize the major result BayesStore, a probabilistic database system that natively support the SML models and various inference algorithms to perform query-driven knowledge extraction from text and probabilistic query processing over uncertain extractions. Result show that BayesStore can significantly improve performance and answer quality for queries over unstructured text. With BayesStore as a foundation, I propose to build a probabilistic knowledge base (probKB) system with a deep integration of the SML methods with scalable data process framework. A probKB system should be designed to support various aspect of the life of a knowledge base, including KB extraction, expansion, evolution and integration. I will discuss detail of the challenge and our current process.

BIO: Daisy Zhe Wang is a Assistant professor in the CISE department at the University of Florida. She obtained her Ph.D. degree from the EECS Department of the University of California Berkeley in 2011 and her Bachelor’s degree from the ECE department at the University of Toronto in 2005. At Berkeley she was a member of the Database Group and the AMP/RAD Lab. She is particularly interesting in bridging scalable data management and processing sematic math probabilistic method and statistical methods. She currently pursues research topics such as probabilistic database, probabilistic knowledge base, Large-scale inference engines, query-driven interactive machine learning and crowd assisted machine learning. Her research is currently funded by DARPA, Greenplum/EMC, survey Monkey and Law School at UK.

Post at:2013-8-5


特邀报告:Machine Learning with Big Graph Data

报告人:Prof Jun Huan, University of Kansas

地点:复旦大学张江校区软件楼102第三会议室

时间: 2013年7月4号 上午10点。

abstract: Graphs are widely used modeling tools that capture objects and their relation. Graph modeled data are found in diverse application areas including bioinformatics, cheminformatics, social network, wireless sensor network among many others. In this talk we will present our recent work on graph kernel functions and graph similarity search in the context of big data, focusing on scalable algorithmic approaches for graph data. Applications of graph modeling techniques in bioinformatics and social network analysis will be touched at the end.

Bio: Dr.Jun(Luke) Huan is an Associate Professor in the department of Electrical Engineering and Computer Science at the University of Kansas. He directs the Bioinformatics and Computational Life Science Laboratory at KU Information and Telecommunication Technology Center(ITIC) and the cheminformatics core at KU specialized Chemistry Center. Dr. Huan hold courtesy appointments at the UK bioinformatics Center, the KU bioengineering program and a visiting professorship from GlaxoSmithKline plc. Dr. Huan received his Ph.D. in Computer Science from the University of North Carolina at Chapel Hill. Before joining the KU in 2006, he worked at Argonne National Laboratory and GlaxoSmithKline plc. Dr. Huan was a recipient of National Science Foundation Faculty Early Career Development Award in 2009. He has published more than 80 peer-reviewed papers in leading conferences and journals, including Nature Biotechnology, His group own the best Students Paper Award at IEEE International conference on data mining in 2011 and the best Paper Award (runner-up) at ACM International Conference on Information and Knowledge Management in 2009. Dr. Huan served on the program committees of prestigious international conferences including ACM SIGMOD, ACM CIKM, ICML, IEEE ICDM, and IEEE BigData.

Post at:2013-7-4


我实验室成果被数据库领域顶级学术会议VLDB2014录用

肖老师在微软亚洲研究院访问期间指导本科生齐自超(被MIT录用为博士生)并与其共同完成的学术论文Distance Oracle on Billion Node Graphs被数据库三大顶级会议VLDB2014录用。VLDB是与SIGMOD齐名的数据库领域顶级会议之一。至今,经肖老师指导的大多数本科生均能在本科期间发表顶级学术论文,其中包括洪骥、崔万云、齐自超等等优秀学生;肖老师指导的大多数研究生至少发表一篇二区学术会议或者知名SCI期刊。

Post at:2013-7-4


我实验室成果被国际知名SCI期刊PLOS One录用

由陈垚亮同学(现就职于IBM中国研究院)在研究生期间主要完成,由洪骥(当时大三,现为UT Austin博士生)、崔万云(当时大二,现大四)参与的大规模基因序列比对工具CGAP-Align被国际知名SCI期刊PLOS One(2012年影响因子为4)正式录用,PLOS One是国际顶级的综合类的学术期刊之一,是影响力第四大的综合性学术期刊,仅次于Nature, Science和PNAS。目前CGAP-Align已经在美国贝勒医学院人类基因组测试中心得到实际应用。在此向几位同学表示祝贺!

Post at:2013-4-9


我实验室成果被数据库领域顶级国际会议SIGMOD2013录用

由崔万云同学在大三期间完成,鲁轶奇同学参与的题为Online Search of Overlapping Communities的研究长文被SIGMOD2013正式录用,SIGMOD是国际数据库领域最为顶级的学术会议,该论文的主体工作由我们团队独立完成。在此向两位同学表示祝贺!

Post at:2013-4-5


我实验室成果在新民晚报和解放日报被刊登和转载

由梁家卿和崔万云两位同学完成的2013年李克强总理记者招待会热点微博分析结果,先后被新民晚报和解放日报报道
http://newspaper.jfdaily.com/jfrb/html/2013-03/19/content_991702.htm
http://sh.eastday.com/m/20130318/u1a7265694.html
, 并被数十家主流媒体转载。

Post at:2013-3-19


实验室招收大四新成员

实验室现开始招收对我们方向感兴趣的大四同学进行毕业设计、毕业论文。

Posted at:2010-9-9


Invitation Talk: Query Friendly Compression of Social Networks Using Multi-Position Linearization
Presented by: Jian Pei, Simon Fraser University, Canada
Time:
2010-9-10, Friday, 1:30PM
Location:
Baozhang Building 310,Zhangjiang Branch| 张江校区保障楼310室
Abstract and BIO of JianPei
(doc)(info)

Posted at: 2010-9-9


图数据管理研究组网站开通啦!

Posted at: 2010-9-9


添加新附件

只有授权的用户才能上传新附件。

附件列表

类型 附件名称 大小 版本 修改日期 作者 变更注释
jpg
daisy.jpg 3.6 kB 1 08-八月-2013 16:39 jxulie
doc
jianpei_haibao.doc 32.8 kB 1 09-九月-2010 10:13 xuxiaomin new version
jpg
junhuan.jpg 5.4 kB 1 08-八月-2013 15:23 jxulie junhuan
jpg
yangqiu.jpg 7.3 kB 1 31-十月-2013 10:43 jxulie
« 该页面(修订版 )最后由 jxulie31-十月-2013 10:49 修改。