Network datasets#
- Pajek Datasets
When publishing results obtained using this data set the original authors should be cited. In addition this collection should be cited as:
Vladimir Batagelj and Andrej Mrvar (2006): Pajek datasets. <url: http://vlado.fmf.uni-lj.si/pub/networks/data/> - Newman's Network data
- Stanford Large Network Dataset Collection
The datasets available on the website were mostly collected (scraped) for the purposes of our research.
Please cite: http://snap.stanford.edu/data/ - Social Network Dataset
- DBLP
The DBLP Computer Science Bibliography - Tweet social graphs
Tweeter followship graph - Online Social Network data
Flickr(users,links,group,groupmembership), LiveJournal, Orkut, Youtube(users,links,groups,groupmembership) - Datamob / Datasets / social networks
- Extracted DBLP Dataset
In this dataset, we extracted out more than 18w papers with title,authors,year,venue and topics. There are totally 25 topicsthat are identified by SVM classifers. It was used in our paper, Which Topic will You Follow? (ECML-PKDD2012)
, and Towards topic following in heterogeneous information networks (ASONAM2015)
Recommendation Datasets#
- Weibo tag and followship dataset
and Douban movie/user tag and rating dataset
These two datasets were used in the experiments of Weibo followship recommendation and Douban movie recommendation (used in our paper in ASONAM2018, ICDM2018, DASFAA2019). - Prostate cancer
- Cross-domain recommendation dataset of Diabetes (used in our papers in CIKM2015 and ICDM2015)
- Douban Movie dataset (used in our papers in CIKM2015 and ICDM2015)
- Human assessment survey for Douban Movie recommendation (50 samples used in our papers in CIKM2015 and ICDM2015)
Entity Resolution#
- arXive hep-th
: KDD Cup 2003 publication dataset, hep-th portion of arXive
- CiteSeer
: collection of research publications
- Cora
: a citation dataset from RIDDLE data repository
- Cora
: a citation dataset from Andrew McCallum's data repository
- DBLP
: collection of bibliographic entries
- DMOZ ontology
: a large downloadable ontology
- Enron Email Dataset
: a dataset of Enron emails
- FEBRL Database
: Freely Extensible Biomedical Record Linkage
- Freedb CD Dataset
: Info on various CDs
- IMDb
: collection of movie-related entries
- PubMed/MEDLINE
: over 20 Million bibliographic entries for biomedical literature
- RIDDLE Repository
: various data cleaning-related datasets
- SPOKE Challenge
: (registration is required) collection of labeled webpages for SPOKE Challenge
- Stanford Movie Dataset
: collection of movie-related entries
- UC Irvine Machine Learning Repository
: collection of various ML datasets
- UIS Database Generator
: generates synthetic names and addresses by injecting errors into clean records
- U.S. Census Names
: frequently occurring first names and surnames from the 1990 Census
- Web Disambiguation
: collection of labeled webpages used by Bekkerman and McCallum in WWW'05
- WEPS Corpus
: collection of labeled webpages used by Artiles, Gonzalo, and Verdejo in SIGIR'05
- Wiktionary
: downloadable free-content multilingual dictionary
Add new attachment
Only authorized users are allowed to upload new attachments.
List of attachments
Kind | Attachment Name | Size | Version | Date Modified | Author | Change note |
---|---|---|---|---|---|---|
txt |
25topics.txt | 0.6 kB | 1 | 07-Oct-2014 22:29 | yangdeqing | |
rar |
Data_Diabetes.rar | 85,956.2 kB | 1 | 10-Nov-2014 14:06 | yangdeqing | |
sql |
Douban_br.sql | 38,930.3 kB | 1 | 09-Jan-2018 15:26 | yangdeqing | |
zip |
HumanAssess_DoubanMovie.zip | 43.4 kB | 1 | 02-Nov-2015 00:14 | yangdeqing | |
rar |
dblp.rar | 6,779.8 kB | 1 | 12-Mar-2013 19:48 | fd_yangdq | DBLP dataset(with topics) |
zip |
douban.sql.zip | 76,593.6 kB | 1 | 03-Oct-2017 07:18 | yangdeqing | 5k users and 42k movies of Douban |
txt |
movie2.txt | 785.9 kB | 1 | 02-Apr-2015 09:19 | yangdeqing | Douban Movies |
zip |
weibo.sql.zip | 145,489.4 kB | 1 | 03-Oct-2017 07:24 | yangdeqing | Weibo user tags and followships |
«
This page (revision-) was last changed on 19-1月-2019 23:06 by yangdeqing