Go back to main download site
To download a corpus select a corpus size - given in number of sentences - and download the corresponding data file.
Mixed
Year Country Downloads
2012 10K 30K 100K 300K 1M 3M
2013 10K 30K 100K 300K 1M 3M
Mixed-tufs4
Year Country Downloads
2012 10K 30K 100K 300K 1M 3M
News
Year Country Downloads
2008 10K 30K 100K 300K 1M 3M
2009 10K 30K 100K 300K 1M 3M
2010 10K 30K 100K 300K 1M 3M
2011 10K 30K 100K 300K 1M 3M
2012 10K 30K 100K 300K 1M 3M
2019 10K 30K 100K 300K 1M 3M
2020 10K 30K 100K 300K 1M 3M
2022 10K 30K 100K 300K 1M 3M
News-tufs10
Year Country Downloads
2011 10K 30K 100K 300K 1M 3M
News-tufs11
Year Country Downloads
2012 10K 30K 100K 300K 1M 3M
News-tufs7
Year Country Downloads
2008 10K 30K 100K 300K 1M 3M
News-tufs8
Year Country Downloads
2009 10K 30K 100K 300K 1M 3M
News-tufs9
Year Country Downloads
2010 10K 30K 100K 300K 1M 3M
Newscrawl
Year Country Downloads
2011 10K 30K 100K 300K 1M 3M
2012 10K 30K 100K 300K 1M 3M
2015 10K 30K 100K 300K 1M 3M
2016 10K 30K 100K 300K 1M 3M
Newscrawl-tufs5
Year Country Downloads
2011 10K 30K 100K 300K 1M 3M
Newscrawl-tufs6
Year Country Downloads
2012 10K 30K 100K 300K 1M 3M
Web
Year Country Downloads
2011 10K 30K 100K 300K 1M 3M
2012 10K 30K 100K 300K 1M 3M
2013 Indonesia 10K 30K 100K 300K 1M 3M
2015 Brunei 10K 30K 100K 300K 1M 3M
2015 India 10K 30K 100K 300K 1M 3M
2015 Indonesia 10K 30K 100K 300K 1M 3M
2017 Indonesia 10K 30K 100K 300K 1M 3M
2018 com 10K 30K 100K 300K 1M 3M
Web-public
Year Country Downloads
2017 Indonesia 10K 30K 100K 300K 1M 3M
Web-tufs12
Year Country Downloads
2011 10K 30K 100K 300K 1M 3M
Web-tufs13
Year Country Downloads
2012 10K 30K 100K 300K 1M 3M
Web-tufs2
Year Country Downloads
2013 10K 30K 100K 300K 1M 3M
Web-tufs3
Year Country Downloads
2015 10K 30K 100K 300K 1M 3M
Wikipedia
Year Country Downloads
2010 10K 30K 100K 300K 1M 3M
2014 10K 30K 100K 300K 1M 3M
2016 10K 30K 100K 300K 1M 3M
2021 10K 30K 100K 300K 1M 3M
Wikipedia-tufs14
Year Country Downloads
2016 10K 30K 100K 300K 1M 3M
Wikipedia-tufs16
Year Country Downloads
2016 10K 30K 100K 300K 1M 3M
Go back to main download site