!!!Network datasets
# [Pajek Datasets|http://vlado.fmf.uni-lj.si/pub/networks/data/]\\
  When publishing results obtained using this data set the original authors should be cited. In addition this collection should be cited as:\\
  Vladimir Batagelj and Andrej Mrvar (2006): Pajek datasets. <url: http://vlado.fmf.uni-lj.si/pub/networks/data/>
# [Newman's Network data|http://www-personal.umich.edu/~mejn/netdata/]
# [Stanford Large Network Dataset Collection|http://snap.stanford.edu/data/]\\
 The datasets available on the website were mostly collected (scraped) for the purposes of our research.\\
 Please cite: http://snap.stanford.edu/data/
# [Social Network Dataset|http://datamob.org/datasets/tag/social-networks]\\
# [DBLP|http://dblp.uni-trier.de/xml/]\\
 The DBLP Computer Science Bibliography
# [Tweet social graphs|http://an.kaist.ac.kr/traces/WWW2010.html/]\\
   Tweeter followship graph
# [Online Social Network data|http://socialnetworks.mpi-sws.org/data-imc2007.html]\\
  Flickr(users,links,group,groupmembership), LiveJournal, Orkut, Youtube(users,links,groups,groupmembership)
# [Datamob / Datasets / social networks|http://datamob.org/datasets/tag/social-networks]
# [Extracted DBLP Dataset|http://gdm.fudan.edu.cn/GDMWiki/attach/Network%20DataSet/dblp.rar]\\
  In this dataset, we extracted out more than 18w papers with title,authors,year,venue and topics. There are totally [25 topics|http://gdm.fudan.edu.cn/GDMWiki/attach/Network%20DataSet/25topics.txt] that are identified by SVM classifers. It was used in our paper, [Which Topic will You Follow? (ECML-PKDD2012)|http://gdm.fudan.edu.cn/GDMWiki/attach/By%20Year/Which%20Topic%20will%20You%20Follow.pdf], and [Towards topic following in heterogeneous information networks (ASONAM2015)|http://gdm.fudan.edu.cn/files1/0704f3/DeqingYang_ASONAM.pdf]

!!!Recommendation Datasets
#[Weibo tag and followship dataset|http://gdm.fudan.edu.cn/GDMWiki/attach/Network%20DataSet/weibo.sql.zip] and [Douban movie/user tag and rating dataset|http://gdm.fudan.edu.cn/GDMWiki/attach/Network%20DataSet/Douban_br.sql]\\
    These two datasets were used in the experiments of Weibo followship recommendation and Douban movie recommendation.
#[Prostate cancer | http://datam.i2r.a-star.edu.sg/datasets/krbd/ProstateCancer/ProstateCancer.html]
#[Cross-domain recommendation dataset of Diabetes (used in our papers in CIKM2015 and ICDM2015)| http://gdm.fudan.edu.cn/GDMWiki/attach/Network%20DataSet/Data_Diabetes.rar]
#[Douban Movie dataset (used in our papers in CIKM2015 and ICDM2015)|
http://gdm.fudan.edu.cn/GDMWiki/attach/Network%20DataSet/movie2.txt]
#[Human assessment survey for Douban Movie recommendation (50 samples used in our papers in CIKM2015 and ICDM2015)|http://gdm.fudan.edu.cn/GDMWiki/attach/Network%20DataSet/HumanAssess_DoubanMovie.zip]
!!!Entity Resolution
# [arXive hep-th|http://www.cs.cornell.edu/projects/kddcup/datasets.html]: KDD Cup 2003 publication dataset, hep-th portion of arXive
# [CiteSeer|http://citeseer.ist.psu.edu/oai.html]: collection of research publications
# [Cora|http://www.cs.utexas.edu/users/ml/riddle/data.html]: a citation dataset from RIDDLE data repository
# [Cora|http://www.cs.umass.edu/~mccallum/code-data.html]: a citation dataset from Andrew McCallum's data repository
# [DBLP|http://dblp.uni-trier.de/xml]: collection of bibliographic entries
# [DMOZ ontology|http://www.dmoz.org/rdf.html]: a large downloadable ontology
# [Enron Email Dataset|http://www.cs.cmu.edu/~enron/]: a dataset of Enron emails
# [FEBRL Database|http://sourceforge.net/projects/febrl/]: Freely Extensible Biomedical Record Linkage
# [Freedb CD Dataset|http://ftp.freedb.org/pub/freedb/]: Info on various CDs
# [IMDb|http://www.imdb.com/interfaces#plain]: collection of movie-related entries
# [PubMed/MEDLINE|http://www.nlm.nih.gov/databases/license/weblic/index.html]: over 20 Million bibliographic entries for biomedical literature
# [RIDDLE Repository|http://www.cs.utexas.edu/users/ml/riddle/]: various data cleaning-related datasets
# [SPOKE Challenge|http://challenge.spock.com/download]: (registration is required) collection of labeled webpages for SPOKE Challenge
# [Stanford Movie Dataset|http://www-db.stanford.edu/pub/movies/doc.html]: collection of movie-related entries
# [UC Irvine Machine Learning Repository|http://archive.ics.uci.edu/ml/]: collection of various ML datasets
# [UIS Database Generator|http://www.cs.utexas.edu/users/ml/riddle/data/dbgen.tar.gz]: generates synthetic names and addresses by injecting errors into clean records
# [U.S. Census Names|http://www.census.gov/genealogy/names/]: frequently occurring first names and surnames from the 1990 Census
# [Web Disambiguation|http://www.cs.umass.edu/~ronb/web_appear/dataset.tar.gz]: collection of labeled webpages used by Bekkerman and McCallum in WWW'05
# [WEPS Corpus|http://nlp.uned.es/weps/weps-corpus.html]: collection of labeled webpages used by Artiles, Gonzalo, and Verdejo in SIGIR'05
# [Wiktionary|http://download.wikimedia.org/]: downloadable free-content multilingual dictionary