treasure-data/embulk-input-google_analytics

View on GitHub
README.md

Summary

Maintainability
Test Coverage
[![CircleCI](https://circleci.com/gh/treasure-data/embulk-input-google_analytics/tree/master.svg?style=svg)](https://circleci.com/gh/treasure-data/embulk-input-google_analytics/tree/master)
[![Code Climate](https://codeclimate.com/github/treasure-data/embulk-input-google_analytics/badges/gpa.svg)](https://codeclimate.com/github/treasure-data/embulk-input-google_analytics)
[![Test Coverage](https://codeclimate.com/github/treasure-data/embulk-input-google_analytics/badges/coverage.svg)](https://codeclimate.com/github/treasure-data/embulk-input-google_analytics/coverage)
[![Issue Count](https://codeclimate.com/github/treasure-data/embulk-input-google_analytics/badges/issue_count.svg)](https://codeclimate.com/github/treasure-data/embulk-input-google_analytics)
[![Gem Version](https://badge.fury.io/rb/embulk-input-google_analytics.svg)](https://badge.fury.io/rb/embulk-input-google_analytics)

# Google Analytics input plugin for Embulk

Embulk input plugin for Google Analytics reports.

## Configuration

- **json_key_content**: See example config.
- **view_id**: View ID for target data. See [Get View ID](#get-view-id) (string, required)
- **time_series**: Only `ga:dateHour` or `ga:date` (string, required)
- **dimensions**: Target dimensions (array, default: `[]` )
- **metrics**: Target metrics (array, default: `[]` )
- **start_date**: Target report start date. Valid format is "YYYY-MM-DD". (string, default: [7 days ago](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet#reportrequest))
- **end_date**: Target report end date. Valid format is "YYYY-MM-DD". (string, default: [1 day ago](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet#reportrequest))
- **incremental**: `true` for generate "config_diff" with `embulk run -c config.diff` (bool, default: true)
- **last_record_time**: Ignore fetched records until this time. Mainly for incremental:true. (string, default: nil)
- **retry_limit**: Try to retry this times (integer, default: 5)
- **retry_initial_wait_sec**: Wait seconds for exponential backoff initial value (integer, default: 2)

### **New update from verions  0.1.18**
Started from version 0.1.18, the Plugin also supports User Account Authentication along with Service Account Authentication see: [OAuth 2.0 for Server-side Web Application](https://developers.google.com/identity/protocols/OAuth2WebServer). Extra optional configuration keys were added and the **json_key_content** is made optional
 - **client_id**: client_id for application (string, optional)
 - **client_secret**: client_secret for application (string, optional)
 - **refresh_token**: the refresh_token obtained during exchange authentication code (string, optional)

### Get View ID

1. Go to the [Google Analytics sign in page](https://analytics.google.com/analytics/) and sign in.
1. Click "Admin" tab at left below
1. Select the "Property" using the drop-down menu below ‘Property’.
1. Select ‘View Settings’ beneath ‘View’.
1. The View ID for the selected property is listed first under ‘Basic Settings

### About `json_key_content` option.

You need a service account on Google.

<ol>
  <li>Open the <a href="https://console.developers.google.com/permissions/serviceaccounts"><b>Service accounts</b> page</a>. If prompted,
select a project.</li>
  <li>Click <b>Create service account</b>.</li>
  <li>

    In the <b>Create service account</b> window, type a name for the service
    account, and select <b>Furnish a new private key</b>. If you want to
    <a href="https://developers.google.com/identity/protocols/OAuth2ServiceAccount#delegatingauthority">grant
    Google Apps domain-wide authority</a> to the service account, also select
    <b>Enable Google Apps Domain-wide Delegation</b>.

    Then click <b>Create</b>.</li>
</ol>
From: <https://developers.google.com/identity/protocols/OAuth2ServiceAccount>

Screenshot: ![Service Account](./service_account.png)

## Why the result doesn't match with web interface?

Google Reporting API uses "sampling" data.

- https://developers.google.com/analytics/devguides/reporting/core/v4/basics#sampling
- https://support.google.com/analytics/answer/2637192

That means sometimes result will be unmatched with Google Analytics web interface, and the result is based on sampled data, not all of raw data. This is a Google API's limitation.

Currently a sampling level supported by this plugin is DEFAULT only. Let us know if you want to use other sampling level (SMALL or LARGE).

## Example

```yaml
in:
  type: google_analytics
  json_key_content: |
    {
      "type": "service_account",
      "project_id": "....",
      "private_key_id": "....",
      "private_key": "-----BEGIN PRIVATE KEY-----\n..........................\n-----END PRIVATE KEY-----\n",
      "client_email": ".....",
      "client_id": ".........",
      "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      "token_uri": "https://accounts.google.com/o/oauth2/token",
      "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      "client_x509_cert_url": ".........."
    }
  view_id: 123111111
  time_series: "ga:dateHour" # hourly basis

  # https://developers.google.com/analytics/devguides/reporting/core/dimsmets
  dimensions:
    - "ga:browser"
  metrics:
    - "ga:visits"
    - "ga:pageviews"

  start_date: "2016-06-27"
  end_date: "2016-06-28"
```

## Config example using User Authentication
```yaml
 in:
  type: google_analytics
  client_id: "#############apps.googleusercontent.com"
  client_secret: "##############QLxgrfis4"
  refresh_token: "##########awWNT9lTeGq8weKE"
  view_id: 123111111
  time_series: "ga:dateHour" # hourly basis
  dimensions:
    - "ga:browser"
  metrics:
    - "ga:visits"
    - "ga:pageviews"

  start_date: "2016-06-27"
  end_date: "2016-06-28"
```

## Build

```
$ rake build
```