IMHO there is acute lack of interesting data sets for statistical education purposes, particularly cross-sectional in nature and not related to biology/medicine/astronomy. The domains I am looking for are management, demography and economics but sport is OK too as many students are interested in sport.
I agree there is a lot of data published (mainly) by various government agencies but its almost 100% time series data. There are many test datasets too (like UC Irvine Machine Learning Repository), but they are usually boring, often categorical and short:-)
To remedy a little the lack of good data I compiled myself a few databases and I intend to add more in the future. At present my repository consists of 4 datasets concerning: -- 1100 or so German WW2 U-boots; -- 2200 or so Titanic passangers/crew members -- 1200 or so Rugby Union players who took part in RWC 2011 and RWC 2007 tournaments.
The repository is available at my github account.
Brak komentarzy:
Prześlij komentarz