sobota, 4 kwietnia 2020

Google Community Mobility Reports

Google has launched a new website that uses anonymous location data collected from users of Google products and services to show the level of social distancing taking place in various locations. The COVID-19 Community Mobility Reports web site will show population data trends of six categories: Retail and recreation, grocery and pharmacy, parks, transit stations, workplaces, and residential. The data will track changes over the course of several weeks, and as recent as 48-to-72 hours prior, and will initially cover 131 countries as well as individual counties within certain states. (cf. www.google.com/covid19/mobility/.)

The raports contains charts and comments in the form: NN% compared to baseline (in six above mentioned categories) where NN is a number. It is assumed the number is a percent change at the last date depicted (which accidentaly is a part of a filename). So for example a filename 2020-03-29_PL_Mobility_Report_en.pdf contains a sentence `Retail & recreation -78% compared to baseline` which (probably) means that (somehow) registered traffic at R&R facilities was 22% of the baseline. Anyway those six numbers was extracted for OECD countries (and some other countries) and converted to CSV file.

The conversion was as follows: first PDF files was downloaded with simple Perl script:

#!/usr/bin/perl
# https://www.google.com/covid19/mobility/
use LWP::UserAgent;
use POSIX 'strftime';

my $sleepTime = 11;

%OECD = ('Australia' => 'AU', 'New Zealand' => 'NZ',
'Austria' => 'AT', 'Norway' => 'NO', 'Belgium' => 'BE',
'Poland' => 'PL', 'Canada' => 'CA', 'Portugal' => 'PT',
'Chile' => 'CL', 'Slovak Republic' => 'SK',
## etc ...
);

@oecd = values %OECD;

my $ua = LWP::UserAgent->new(agent => 'Mozilla/5.0', cookie_jar =>{});
my $date = "2020-03-29";

foreach $c (sort @oecd) {
   $PP="https://www.gstatic.com/covid19/mobility/${date}_${c}_Mobility_Report_en.pdf";

   my $req = HTTP::Request->new(GET => $PP);
   my $res = $ua->request($req, "${date}_${c}_Mobility_Report_en.pdf");

   if ($res->is_success) { print $res->as_string; }
   else { print "Failed: ", $res->status_line, "\n"; }
}

Next PDF files was converted to .txt with pdftotext. The relevant fragments of .txt files looks like:

  Retail & recreation
+80%

-78%
compared to baseline

So it looks easy to extract the relevant numbers: scan line-by-line looking for a line with appropriate content (Retail & recreation for example). If found start searching for 'compared to baseline'. If found retrieve previous line:

#!/usr/bin/perl
$file = $ARGV[0];

while (<>) {   chomp();
  if (/Retail \& recreation/ ) { $rr = scan2base(); }
  if (/Grocery \& pharmacy/ ) { $gp = scan2base(); }
  if (/Parks/ ) { $parks = scan2base(); }
  if (/Transit stations/ ) { $ts = scan2base(); }
  if (/Workplaces/ ) { $wps = scan2base(); }
  if (/Residential/ ) { $res = scan2base();
     print "$file;$rr;$gp;$parks;$ts;$wps;$res\n";
     last;  }
}

sub scan2base {
  while (<>) {
   chomp();
   if (/compared to baseline/) {  return ($prevline); }
   $prevline = $_;
  }
}

Extracted data can be found here.

Brak komentarzy:

Prześlij komentarz