Google has launched a new website that uses anonymous location data collected from users of Google products and services to show the level of social distancing taking place in various locations. The COVID-19 Community Mobility Reports web site will show population data trends of six categories: Retail and recreation, grocery and pharmacy, parks, transit stations, workplaces, and residential. The data will track changes over the course of several weeks, and as recent as 48-to-72 hours prior, and will initially cover 131 countries as well as individual counties within certain states. (cf. www.google.com/covid19/mobility/.)
The raports contains charts and comments in the form: NN% compared to baseline (in six above mentioned categories) where NN is a number. It is assumed the number is a percent change at the last date depicted (which accidentaly is a part of a filename). So for example a filename 2020-03-29_PL_Mobility_Report_en.pdf
contains a sentence `Retail & recreation -78% compared to baseline` which (probably) means that (somehow) registered traffic at R&R facilities was 22% of the baseline. Anyway those six numbers was extracted for OECD countries (and some other countries) and converted to CSV file.
The conversion was as follows: first PDF files was downloaded with simple Perl script:
#!/usr/bin/perl # https://www.google.com/covid19/mobility/ use LWP::UserAgent; use POSIX 'strftime'; my $sleepTime = 11; %OECD = ('Australia' => 'AU', 'New Zealand' => 'NZ', 'Austria' => 'AT', 'Norway' => 'NO', 'Belgium' => 'BE', 'Poland' => 'PL', 'Canada' => 'CA', 'Portugal' => 'PT', 'Chile' => 'CL', 'Slovak Republic' => 'SK', ## etc ... ); @oecd = values %OECD; my $ua = LWP::UserAgent->new(agent => 'Mozilla/5.0', cookie_jar =>{}); my $date = "2020-03-29"; foreach $c (sort @oecd) { $PP="https://www.gstatic.com/covid19/mobility/${date}_${c}_Mobility_Report_en.pdf"; my $req = HTTP::Request->new(GET => $PP); my $res = $ua->request($req, "${date}_${c}_Mobility_Report_en.pdf"); if ($res->is_success) { print $res->as_string; } else { print "Failed: ", $res->status_line, "\n"; } }
Next PDF files was converted to .txt
with pdftotext
. The relevant fragments of .txt
files looks like:
Retail & recreation +80% -78% compared to baseline
So it looks easy to extract the relevant numbers: scan line-by-line looking for a line with appropriate content (Retail & recreation for example). If found start searching for 'compared to baseline'. If found retrieve previous line:
#!/usr/bin/perl $file = $ARGV[0]; while (<>) { chomp(); if (/Retail \& recreation/ ) { $rr = scan2base(); } if (/Grocery \& pharmacy/ ) { $gp = scan2base(); } if (/Parks/ ) { $parks = scan2base(); } if (/Transit stations/ ) { $ts = scan2base(); } if (/Workplaces/ ) { $wps = scan2base(); } if (/Residential/ ) { $res = scan2base(); print "$file;$rr;$gp;$parks;$ts;$wps;$res\n"; last; } } sub scan2base { while (<>) { chomp(); if (/compared to baseline/) { return ($prevline); } $prevline = $_; } }
Extracted data can be found here.
Brak komentarzy:
Prześlij komentarz