Google has launched a new website that uses anonymous location data collected from users of Google products and services to show the level of social distancing taking place in various locations. The COVID-19 Community Mobility Reports web site will show population data trends of six categories: Retail and recreation, grocery and pharmacy, parks, transit stations, workplaces, and residential. The data will track changes over the course of several weeks, and as recent as 48-to-72 hours prior, and will initially cover 131 countries as well as individual counties within certain states. (cf. www.google.com/covid19/mobility/.)
The raports contains charts and comments in the form: NN% compared to baseline (in six above mentioned categories) where NN is a number. It is assumed the number is a percent change at the last date depicted (which accidentaly is a part of a filename). So for example a filename 2020-03-29_PL_Mobility_Report_en.pdf contains a sentence `Retail & recreation -78% compared to baseline` which (probably) means that (somehow) registered traffic at R&R facilities was 22% of the baseline. Anyway those six numbers was extracted for OECD countries (and some other countries) and converted to CSV file.
The conversion was as follows: first PDF files was downloaded with simple Perl script:
#!/usr/bin/perl
# https://www.google.com/covid19/mobility/
use LWP::UserAgent;
use POSIX 'strftime';
my $sleepTime = 11;
%OECD = ('Australia' => 'AU', 'New Zealand' => 'NZ',
'Austria' => 'AT', 'Norway' => 'NO', 'Belgium' => 'BE',
'Poland' => 'PL', 'Canada' => 'CA', 'Portugal' => 'PT',
'Chile' => 'CL', 'Slovak Republic' => 'SK',
## etc ...
);
@oecd = values %OECD;
my $ua = LWP::UserAgent->new(agent => 'Mozilla/5.0', cookie_jar =>{});
my $date = "2020-03-29";
foreach $c (sort @oecd) {
$PP="https://www.gstatic.com/covid19/mobility/${date}_${c}_Mobility_Report_en.pdf";
my $req = HTTP::Request->new(GET => $PP);
my $res = $ua->request($req, "${date}_${c}_Mobility_Report_en.pdf");
if ($res->is_success) { print $res->as_string; }
else { print "Failed: ", $res->status_line, "\n"; }
}
Next PDF files was converted to .txt with pdftotext. The relevant fragments of .txt files looks like:
Retail & recreation +80% -78% compared to baseline
So it looks easy to extract the relevant numbers: scan line-by-line looking for a line with appropriate content (Retail & recreation for example). If found start searching for 'compared to baseline'. If found retrieve previous line:
#!/usr/bin/perl
$file = $ARGV[0];
while (<>) { chomp();
if (/Retail \& recreation/ ) { $rr = scan2base(); }
if (/Grocery \& pharmacy/ ) { $gp = scan2base(); }
if (/Parks/ ) { $parks = scan2base(); }
if (/Transit stations/ ) { $ts = scan2base(); }
if (/Workplaces/ ) { $wps = scan2base(); }
if (/Residential/ ) { $res = scan2base();
print "$file;$rr;$gp;$parks;$ts;$wps;$res\n";
last; }
}
sub scan2base {
while (<>) {
chomp();
if (/compared to baseline/) { return ($prevline); }
$prevline = $_;
}
}
Extracted data can be found here.
Brak komentarzy:
Prześlij komentarz