snowplow/snowplow-ruby-tracker

View on GitHub
lib/snowplow-tracker/tracker.rb

Summary

Maintainability
A
1 hr
Test Coverage
# Copyright (c) 2013-2021 Snowplow Analytics Ltd. All rights reserved.
#
# This program is licensed to you under the Apache License Version 2.0,
# and you may not use this file except in compliance with the Apache License Version 2.0.
# You may obtain a copy of the Apache License Version 2.0 at http://www.apache.org/licenses/LICENSE-2.0.
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the Apache License Version 2.0 is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the Apache License Version 2.0 for the specific language governing permissions and limitations there under.

# Author:: Snowplow Analytics Ltd
# Copyright:: Copyright (c) 2013-2021 Snowplow Analytics Ltd
# License:: Apache License Version 2.0


require 'securerandom'
require 'set'

module SnowplowTracker
  # Allows the tracking of events. The tracker accepts event properties to its
  # various `track_x_event` methods, and creates an appropriate event payload.
  # This payload is passed to one or more Emitters for sending to the event
  # collector.
  #
  # A Tracker is always associated with one {Subject}, and one or more
  # {Emitter}. The Subject object stores information about the user, and will be
  # generated automatically if one is not provided during initialization. It can
  # be swapped out for another Subject using {#set_subject}.
  #
  # Tracker objects can access the methods of their associated {Subject}, e.g.
  # {#set_user_id}.
  #
  # The Emitter, or an array of Emitters, must be given during initialization.
  # They will send the prepared events to the event collector. It's possible to
  # add further Emitters to an existing Tracker, using {#add_emitter}. However,
  # Emitters cannot be removed from Trackers.
  #
  # At initialization, two Tracker parameters can be set which will be added to
  # all events. The first is the Tracker namespace. This is especially useful to
  # distinguish between events from different Trackers, if more than one is
  # being used. The namespace value will be sent as the `tna` field in the raw
  # event, mapping to `name_tracker` in the processed event.
  #
  # The second user-set Tracker property is the app ID (`aid`; `app_id`). This
  # is the unique identifier for the site or application, and is particularly
  # useful for distinguishing between events when Snowplow tracking has been
  # implemented in multiple apps.
  #
  # The final initialization parameter is a setting for the base64-encoding of
  # any JSONs in the event payload. These will be the {SelfDescribingJson}s used
  # to provide context to events, or in the {#track_self_describing_event}
  # method. The default is for JSONs to be encoded. Once the Tracker has been
  # instantiated, it is not possible to change this setting.
  #
  # # Tracking events
  #
  # The Tracker `#track_x_event` methods all work similarly. An event payload is
  # created containing the relevant properties, which is passed to an {Emitter}
  # for sending. All payloads have a unique event ID (`event_id`) added to them
  # (a type-4 UUID created using the SecureRandom module). This is sent as the
  # `eid` field in the raw event.
  #
  # The Ruby tracker provides the ability to track multiple types of events
  # out-of-the-box. The `#track_x_event` methods range from single purpose
  # methods, such as {#track_page_view}, to the more complex but flexible
  # {#track_self_describing_event}, which can be used to track any kind of
  # event. We strongly recommend using {#track_self_describing_event} for your
  # tracking, as it allows you to design custom event types to match your
  # business requirements.
  #
  # This table gives the event type in the raw and processed events, defined in
  # the Snowplow Tracker Protocol. This is the `e` or `event` parameter. Note
  # that {#track_screen_view} calls {#track_self_describing_event} behind the
  # scenes, resulting in a `ue` event type.
  #
  # <br>
  #
  # | Tracker method | `e` (raw) | `event` (processed) |
  # | --- | --- | --- |
  # | {#track_self_describing_event} | `ue` | `unstruct` |
  # | {#track_struct_event} | `se` | `struct` |
  # | {#track_page_view} | `pv` | `page_view` |
  # | {#track_ecommerce_transaction} | `tr` and `ti` | `transaction` and `transaction_item` |
  # | {#track_screen_view} | `ue` | `unstruct` |
  #
  # <br>
  #
  # The name `ue`, "unstructured event", is partially depreciated. This event
  # type was originally created as a counterpart to "structured event", but the
  # name is misleading. An `unstruct` event requires a schema ruleset and
  # therefore can be considered more structured than a `struct` event. We prefer
  # the name "self-describing event", after the {SelfDescribingJson} schema.
  # Changing the event name in the Tracker Protocol would be a breaking change,
  # so for now the self-describing events are still sent as "unstruct".
  #
  # All the `#track_x_event` methods share common features and parameters. Every
  # type of event can have an optional context, {Subject}, and {Page} added. A
  # {Timestamp} can also be provided for all event types to override the default
  # event timestamp.
  #
  # [Event
  # context](https://docs.snowplowanalytics.com/docs/understanding-tracking-design/understanding-events-entities/)
  # can be provided as an array of {SelfDescribingJson}. Each element of the
  # array is called an entity. Contextual entities can be used to describe the
  # setting in which the event occurred. For example, a "user" entity could be
  # created and attached to all events from each user. For a search event,
  # entities could be attached for each of the search results. The Ruby tracker
  # does not automatically add any event context. This is in contrast to the
  # [Snowplow JavaScript Tracker](https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/javascript-trackers/),
  # which automatically attaches a "webpage" entity to every event that it tracks,
  # containing a unique ID for that loaded page.
  #
  # @see Subject
  # @see Emitter
  # @see
  #   https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/snowplow-tracker-protocol
  #   the Snowplow Tracker Protocol
  # @see
  #   https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/ruby-tracker/tracking-events/
  #   the Snowplow docs page about tracking events
  # @see
  #   https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/ruby-tracker/enriching-your-events/
  #   the Snowplow docs page about adding context and other extra data to events
  # @see
  #   https://docs.snowplowanalytics.com/docs/understanding-tracking-design/introduction-to-tracking-design/
  #   introduction to Snowplow tracking design
  # @api public
  class Tracker
    # @!group Public constants

    # SelfDescribingJson objects are sent encoded by default
    DEFAULT_ENCODE_BASE64 = true

    # @private
    BASE_SCHEMA_PATH = 'iglu:com.snowplowanalytics.snowplow'
    # @private
    SCHEMA_TAG = 'jsonschema'
    # @private
    CONTEXT_SCHEMA = "#{BASE_SCHEMA_PATH}/contexts/#{SCHEMA_TAG}/1-0-1"
    # @private
    UNSTRUCT_EVENT_SCHEMA = "#{BASE_SCHEMA_PATH}/unstruct_event/#{SCHEMA_TAG}/1-0-0"
    # @private
    SCREEN_VIEW_SCHEMA = "#{BASE_SCHEMA_PATH}/screen_view/#{SCHEMA_TAG}/1-0-0"

    # @!endgroup

    # Create a new Tracker. `emitters` is the only strictly required parameter.
    #
    # @param emitters [Emitter, Array<Emitter>] one or more Emitter objects
    # @param subject [Subject] a Subject object
    # @param namespace [String] a name for the Tracker
    # @param app_id [String] the app ID
    # @param encode_base64 [Bool] whether JSONs will be base64-encoded or not
    # @example Initializing a Tracker with all possible options
    #   SnowplowTracker::Tracker.new(
    #               emitters: SnowplowTracker::Emitter.new(endpoint: 'collector.example.com'),
    #               subject: SnowplowTracker::Subject.new,
    #               namespace: 'tracker_no_encode',
    #               app_id: 'rails_main',
    #               encode_base64: false
    #              )
    # @api public
    #
    # @note All the Tracker instance methods return the Tracker object, allowing
    #   method chaining, e.g.
    #   `SnowplowTracker::Tracker.new.set_user_id('12345').track_page_view(page_url: 'www.example.com`
    def initialize(emitters:, subject: nil, namespace: nil, app_id: nil, encode_base64: DEFAULT_ENCODE_BASE64)
      @emitters = Array(emitters)
      @subject = if subject.nil?
                   Subject.new
                 else
                   subject
                 end
      @settings = {
        'tna' => namespace,
        'tv'  => TRACKER_VERSION,
        'aid' => app_id
      }
      @encode_base64 = encode_base64
    end

    # @!method set_color_depth(depth)
    #   call {Subject#set_color_depth}
    # @!method set_domain_session_id(sid)
    #   call {Subject#set_domain_session_id}
    # @!method set_domain_session_idx(vid)
    #   call {Subject#set_domain_session_idx}
    # @!method set_domain_user_id(duid)
    #   call {Subject#set_domain_user_id}
    # @!method set_fingerprint(fingerprint)
    #   call {Subject#set_fingerprint}
    # @!method set_ip_address(ip)
    #   call {Subject#set_ip_address}
    # @!method set_lang(lang)
    #   call {Subject#set_lang}
    # @!method set_network_user_id(nuid)
    #   call {Subject#set_network_user_id}
    # @!method set_platform(platform)
    #   call {Subject#set_platform}
    # @!method set_screen_resolution(width:, height:)
    #   call {Subject#set_screen_resolution}
    # @!method set_timezone(timezone)
    #   call {Subject#set_timezone}
    # @!method set_user_id(user_id)
    #   call {Subject#set_user_id}
    # @!method set_useragent(useragent)
    #   call {Subject#set_useragent}
    # @!method set_viewport(width:, height:)
    #   call {Subject#set_viewport}
    Subject.instance_methods(false).each do |name|
      if RUBY_VERSION >= '3.0.0'
        define_method name, ->(*args, **kwargs) do
          @subject.method(name.to_sym).call(*args, **kwargs)

          self
        end
      else
        define_method name, ->(*args) do
          @subject.method(name.to_sym).call(*args)

          self
        end
      end
    end

    # Generates a type-4 UUID to identify this event
    # @private
    def event_id
      SecureRandom.uuid
    end

    # Builds a single self-describing JSON from an array of custom contexts
    # @private
    def build_context(context)
      SelfDescribingJson.new(
        CONTEXT_SCHEMA,
        context.map(&:to_json)
      ).to_json
    end

    # Sends the payload hash as a request to the Emitter(s)
    # @private
    def track(payload)
      @emitters.each { |emitter| emitter.input(payload.data) }

      nil
    end

    # Ensures that either a DeviceTimestamp or TrueTimestamp is associated with
    # every event.
    # @private
    def process_tstamp(tstamp)
      tstamp = Timestamp.create if tstamp.nil?
      tstamp = DeviceTimestamp.new(tstamp) if tstamp.is_a? Numeric
      tstamp
    end

    # Attaches the more generic fields to the event payload. This includes
    # context, Subject, and Page if they are present. The timestamp is added, as
    # well as all fields from `@settings`.
    #
    # Finally, the Tracker generates and attaches an event ID.
    # @private
    def finalise_payload(payload, context, tstamp, event_subject, page)
      payload.add_json(build_context(context), @encode_base64, 'cx', 'co') unless context.nil? || context.empty?
      payload.add_hash(page.details) unless page.nil?

      if event_subject.nil?
        payload.add_hash(@subject.details)
      else
        payload.add_hash(@subject.details.merge(event_subject.details))
      end

      payload.add(tstamp.type, tstamp.value)
      payload.add_hash(@settings)
      payload.add('eid', event_id)

      nil
    end

    # Track a visit to a page.
    # @example
    #   SnowplowTracker::Tracker.new.track_page_view(page_url: 'www.example.com',
    #                                                page_title: 'example',
    #                                                referrer: 'www.referrer.com')
    #
    # @param page_url [String] the URL of the page
    # @param page_title [String] the page title
    # @param referrer [String] the URL of the referrer page
    # @param context [Array<SelfDescribingJson>] an array of SelfDescribingJson objects
    # @param tstamp [DeviceTimestamp, TrueTimestamp, Num] override the default DeviceTimestamp of the event
    # @param subject [Subject] event-specific Subject object
    # @param page [Page] override the page_url, page_title, or referrer
    #
    # @api public
    def track_page_view(page_url:, page_title: nil, referrer: nil,
                        context: nil, tstamp: nil, subject: nil, page: nil)
      tstamp = process_tstamp(tstamp)

      payload = Payload.new
      payload.add('e', 'pv')
      payload.add('url', page_url)
      payload.add('page', page_title)
      payload.add('refr', referrer)

      finalise_payload(payload, context, tstamp, subject, page)
      track(payload)

      self
    end

    # Track an eCommerce transaction, and all the items in it.
    #
    # This method is unique in sending multiple events: one `transaction` event,
    # and one `transaction_item` event for each item. If Subject or Page objects
    # are provided, their parameters will be merged into both `transaction` and
    # `transaction_item` events. The timestamp and event ID of the
    # `transaction_item` events will always be the same as the `transaction`.
    # Transaction items are also automatically populated with the `order_id` and
    # `currency` fields from the transaction.
    #
    # Event context is handled differently for `transaction` and
    # `transaction_item` events. A context array argument provided to this
    # method will be attached to the `transaction` event only. To attach a
    # context array to a transaction item, use the key "context" in the item
    # hash.
    #
    # The transaction and item hash arguments must contain the correct keys, as
    # shown in the tables below.
    #
    # | Transaction fields | Description | Required? | Type |
    # | --- | --- | --- | --- |
    # | order_id |  ID of the eCommerce transaction  | Yes  | String |
    # | total_value |  Total transaction value  | Yes  | Num |
    # | affiliation |  Transaction affiliation  | No  | String |
    # | tax_value |  Transaction tax value  | No  | Num |
    # | shipping |  Delivery cost charged  | No  | Num |
    # | city |  Delivery address city  | No  | String |
    # | state |  Delivery address state  | No  | String |
    # | country |  Delivery address country  | No  | String |
    # | currency |  Transaction currency  | No  | String |
    #
    # <br>
    #
    # | Item fields | Description | Required? | Type |
    # | --- | --- | --- | --- |
    # | sku | Item SKU  | Yes | String |
    # | price | Item price  | Yes | Num |
    # | quantity |  Item quantity | Yes | Integer |
    # | name |  Item name | No |  String |
    # | category |  Item category | No |  String |
    # | context | Item event context  | No |  Array[{SelfDescribingJson}] |
    #
    # @example Tracking a transaction containing two items
    #   SnowplowTracker::Tracker.new.track_ecommerce_transaction(
    #     transaction: {
    #       'order_id' => '12345',
    #       'total_value' => 49.99,
    #       'affiliation' => 'my_company',
    #       'tax_value' => 0,
    #       'shipping' => 0,
    #       'city' => 'Phoenix',
    #       'state' => 'Arizona',
    #       'country' => 'USA',
    #       'currency' => 'USD'
    #     },
    #     items: [
    #       {
    #         'sku' => 'pbz0026',
    #         'price' => 19.99,
    #         'quantity' => 1
    #       },
    #       {
    #         'sku' => 'pbz0038',
    #         'price' => 15,
    #         'quantity' => 2,
    #         'name' => 'crystals',
    #         'category' => 'magic'
    #       }
    #     ]
    #   )
    #
    # @param transaction [Hash] the correctly structured transaction hash
    # @param items [Array<Hash>] an array of correctly structured item hashes
    # @param context [Array<SelfDescribingJson>] an array of SelfDescribingJson objects
    # @param tstamp [DeviceTimestamp, TrueTimestamp, Num] override the default DeviceTimestamp of the event
    # @param subject [Subject] event-specific Subject object
    # @param page [Page] event-specific Page object
    #
    # @api public
    def track_ecommerce_transaction(transaction:, items:,
                                    context: nil, tstamp: nil,
                                    subject: nil, page: nil)
      tstamp = process_tstamp(tstamp)

      transform_keys(transaction)

      payload = Payload.new
      payload.add('e', 'tr')
      payload.add('tr_id', transaction['order_id'])
      payload.add('tr_tt', transaction['total_value'])
      payload.add('tr_af', transaction['affiliation'])
      payload.add('tr_tx', transaction['tax_value'])
      payload.add('tr_sh', transaction['shipping'])
      payload.add('tr_ci', transaction['city'])
      payload.add('tr_st', transaction['state'])
      payload.add('tr_co', transaction['country'])
      payload.add('tr_cu', transaction['currency'])

      finalise_payload(payload, context, tstamp, subject, page)

      track(payload)

      items.each do |item|
        transform_keys(item)
        item['tstamp'] = tstamp
        item['order_id'] = transaction['order_id']
        item['currency'] = transaction['currency']
        track_ecommerce_transaction_item(item, subject, page)
      end

      self
    end

    # Makes sure all hash keys are strings rather than symbols.
    # The Ruby core language added a method for this in Ruby 2.5.
    # @private
    def transform_keys(hash)
      hash.keys.each { |key| hash[key.to_s] = hash.delete key }
    end

    # Track a single item within an ecommerce transaction.
    # @private
    def track_ecommerce_transaction_item(details, subject, page)
      payload = Payload.new
      payload.add('e', 'ti')
      payload.add('ti_id', details['order_id'])
      payload.add('ti_sk', details['sku'])
      payload.add('ti_pr', details['price'])
      payload.add('ti_qu', details['quantity'])
      payload.add('ti_nm', details['name'])
      payload.add('ti_ca', details['category'])
      payload.add('ti_cu', details['currency'])

      finalise_payload(payload, details['context'], details['tstamp'], subject, page)
      track(payload)

      self
    end

    # Track a structured event. `category` and `action` are required.
    #
    # This event type can be used to track many types of user activity, as it is
    # somewhat customizable. This event type is provided particularly for
    # concordance with Google Analytics tracking, where events are structured by
    # "category", "action", "label", and "value".
    #
    # For fully customizable event tracking, we recommend you use
    # self-describing events.
    #
    # @example
    #   SnowplowTracker::Tracker.new.track_struct_event(
    #     category: 'shop',
    #     action: 'add-to-basket',
    #     property: 'pcs',
    #     value: 2
    #   )
    #
    #
    # @see #track_self_describing_event
    #
    # @param category [String] the event category
    # @param action [String] the action performed
    # @param label [String] an event label
    # @param property [String] an event property
    # @param value [Num] a value for the event
    # @param context [Array<SelfDescribingJson>] an array of SelfDescribingJson objects
    # @param tstamp [DeviceTimestamp, TrueTimestamp, Num] override the default DeviceTimestamp of the event
    # @param subject [Subject] event-specific Subject object
    # @param page [Page] event-specific Page object
    #
    # @api public
    def track_struct_event(category:, action:, label: nil, property: nil,
                           value: nil, context: nil, tstamp: nil, subject: nil, page: nil)
      tstamp = process_tstamp(tstamp)

      payload = Payload.new
      payload.add('e', 'se')
      payload.add('se_ca', category)
      payload.add('se_ac', action)
      payload.add('se_la', label)
      payload.add('se_pr', property)
      payload.add('se_va', value)

      finalise_payload(payload, context, tstamp, subject, page)
      track(payload)

      self
    end

    # Track a screen view event. Note that while the `name` and `id` parameters
    # are both optional, you must provided at least one of them to create a
    # valid event.
    #
    # This method creates an `unstruct` event, by creating a
    # {SelfDescribingJson} and calling {#track_self_describing_event}. The
    # schema ID for this is
    # "iglu:com.snowplowanalytics.snowplow/screen_view/jsonschema/1-0-0", and
    # the data field will contain the name and/or ID.
    #
    # @example
    #   SnowplowTracker::Tracker.new.track_screen_view(name: 'HUD > Save Game',
    #                                                  id: 'screen23')
    #
    #
    # @see #track_page_view
    # @see #track_self_describing_event
    #
    # @param name [String] the screen name (human readable)
    # @param id [String] the unique screen ID
    # @param context [Array<SelfDescribingJson>] an array of SelfDescribingJson objects
    # @param tstamp [DeviceTimestamp, TrueTimestamp, Num] override the default DeviceTimestamp of the event
    # @param subject [Subject] event-specific Subject object
    # @param page [Page] event-specific Page object
    #
    # @api public
    def track_screen_view(name: nil, id: nil, context: nil, tstamp: nil, subject: nil, page: nil)
      screen_view_properties = {}
      screen_view_properties['name'] = name unless name.nil?
      screen_view_properties['id'] = id unless id.nil?

      event_json = SelfDescribingJson.new(SCREEN_VIEW_SCHEMA, screen_view_properties)
      track_unstruct_event(event_json: event_json, context: context,
                           tstamp: tstamp, subject: subject, page: page)

      self
    end

    # Track a self-describing event. These are custom events based on
    # {SelfDescribingJson}, i.e. a JSON schema and a defined set of properties.
    #
    # This is useful for tracking specific or proprietary event types, or events
    # with unpredicable or frequently changing properties.
    #
    # This method creates an `unstruct` event type. It is actually an alias for
    # {#track_unstruct_event}, which is depreciated due to its unhelpful name.
    #
    # @example
    #   self_desc_json = SnowplowTracker::SelfDescribingJson.new(
    #     "iglu:com.example_company/save_game/jsonschema/1-0-2",
    #     {
    #       "saveId" => "4321",
    #       "level" => 23,
    #       "difficultyLevel" => "HARD",
    #       "dlContent" => true
    #     }
    #   )
    #
    #   SnowplowTracker::Tracker.new.track_self_describing_event(event_json: self_desc_json)
    #
    #
    # @param event_json [SelfDescribingJson] a SelfDescribingJson object
    # @param context [Array<SelfDescribingJson>] an array of SelfDescribingJson objects
    # @param tstamp [DeviceTimestamp, TrueTimestamp, Num] override the default DeviceTimestamp of the event
    # @param subject [Subject] event-specific Subject object
    # @param page [Page] event-specific Page object
    #
    # @api public
    def track_self_describing_event(event_json:, context: nil, tstamp: nil, subject: nil, page: nil)
      track_unstruct_event(event_json: event_json, context: context,
                           tstamp: tstamp, subject: subject, page: page)
    end

    # @deprecated Use {#track_self_describing_event} instead.
    #
    # @api public
    def track_unstruct_event(event_json:, context: nil, tstamp: nil, subject: nil, page: nil)
      tstamp = process_tstamp(tstamp)

      payload = Payload.new
      payload.add('e', 'ue')

      envelope = SelfDescribingJson.new(UNSTRUCT_EVENT_SCHEMA, event_json.to_json)
      payload.add_json(envelope.to_json, @encode_base64, 'ue_px', 'ue_pr')

      finalise_payload(payload, context, tstamp, subject, page)
      track(payload)

      self
    end

    # Manually flush all events stored in all Tracker-associated Emitters. By
    # default, this happens synchronously. {Emitter}s can only send events
    # synchronously, while {AsyncEmitter}s can send either synchronously or
    # asynchronously.
    #
    # @param async [Bool] whether to flush asynchronously or not
    #
    # @api public
    def flush(async: false)
      @emitters.each do |emitter|
        emitter.flush(async)
      end

      self
    end

    # Replace the existing Tracker-associated Subject with the provided one. All
    # subsequent events will have the properties of the new Subject, unless they
    # are overriden by event-specific Subject parameters.
    #
    # @param subject [Subject] a Subject object
    #
    # @api public
    def set_subject(subject)
      @subject = subject
      self
    end

    # Add a new Emitter to the internal array of Tracker-associated Emitters.
    #
    # @param emitter [Emitter] an Emitter object
    #
    # @api public
    def add_emitter(emitter)
      @emitters.push(emitter)
      self
    end

    private :build_context,
            :track,
            :track_ecommerce_transaction_item
  end
end