Optimizely Data Export Services

Customers with Business or Enterprise plans can use one of two Optimizely data export services to analyze either the raw events or results data of their experiments.

Overview

Optimizely provides two data export services for the Optimizely X Web, Full Stack, and Personalization products:

The diagram below provides a high-level view of the two data export services. Optimizely stores both data sources on AWS S3 and runs a daily data mining job that exports all necessary records per export service created in the last 24 hours (12:00 AM-11:59 PM UTC) to AWS S3 on a daily basis. You can access the data programmatically via Amazon’s API/CLI/SDK using secure credentials provisioned by Optimizely.

Data Export Services Diagram

The export comparison table below lists the key differences between the two export services.

Raw Events Export Results Export
Scope

All events received by Optimizely customer with the exact (unprocessed) data sent by the client.

All conversions Optimizely has attributed and counted in the results; includes server processing information such as easy event attribution and session data.

Data

Raw events

Results records

Schema

TSV

Apache Parquet

Analysis Type

Event-level

Metrics-level

Successful Export

Indicated by a `status.yaml`.

Indicated by a `_SUCCESS` file.

Example Queries

“query events sent by IP a.b.c.d” and “query events with orderId=xyz”

“query revenue per visitor” and “query conversions per session”

Use Cases

Inspect all events sent by a visitor. Retrieve conversions that have a specific attribute or tag.

Combine Optimizely data with other data sources to measure experimentation impact on external metrics. Analyze results using SQL queries. Build custom dashboards using tools like Tableau and Chartio.

Access Optimizely Export Data via Amazon S3

Results Export and Raw Events Export are only available to customers with Business and Enterprise plans.

Optimizely can provide you with a set of credentials you can use to access our Amazon S3 bucket. We need to transmit these credentials in a secure way—so emailing them won't suffice! We use PGP encryption to facilitate secure transmission for Amazon S3 credentials. Here's the step-by-step process:

  1. Generate a public and private PGP key.
  2. File a support ticket to request access to a data export service. Make sure to include your public PGP key in your ticket.
  3. Optimizely Support will send you an encrypted file with your Amazon S3 credentials.
  4. Decrypt the credentials from Optimizely Support.

Validate Your Credentials

Next, you'll need to verify that the credentials from Optimizely Support provide you with access to the export data:

  1. Install AWS command line tools.
  2. Run aws configure to input your access key and secret access key.
  3. List the files in your directory with the correct command:
    • Results Export: aws s3 ls s3://optimizely-rex/<account id>/
    • Raw Events Export: aws s3 ls s3://optimizely-export-ng/<account id>/

Note: The final forward-slash is necessary because your credentials will only provide access to one folder inside the Optimizely Amazon S3 bucket. Here's a link where you can access your account ID.

You should see a list of your projects that have had running experiments:

us-troddy:~ troddy$ aws s3 ls s3://optimizely-export-ng/2779270234/
                           PRE 3219400867/
                           PRE 7620220811/
                           PRE 8431340120/
us-troddy:~ troddy$

If you see the following message:

An error occurred (AccessDenied) when calling the ListObjects operation: Access Denied

Export Data Location

If you’re retrieving Results Export files, see Results Export: AWS S3. If you’re retrieving Raw Events Export files, see Raw Data Export: AWS S3. You can do a recursive list operation to see the full structure within your directory.

See also the KB article Access Optimizely data exports.

Retention Policy

To comply with GDPR requirements, Optimizely retains the files in your Data Export bucket for both export services for 30 days. Older data is automatically deleted. To retain the data for a longer period of time, ensure that your import process archives the files to your data warehouse at least once every 30 days.

Encryption

Your Data Export data is encrypted. To access the data, you need one of the following clients, at minimum:

Resources

For more information, see:


Results Export

The Results Export contains all conversions attributed and counted in the results, and provides a view of your experimentation data that comes with these important benefits:

Session results example

For more information, such as appropriate use cases for each export, see the export comparison table in the Overview.

Get Started

See Access Optimizely Export Data via Amazon S3.

Technical Details

The Results Export process generates files containing the results records created in the past 24 hours (00:00 - 23:59 UTC). The files are stored in Apache Parquet format, and the partitioning scheme is:

AWS S3

This section describes the Results Export files that you’ll retrieve from your Optimizely AWS S3 bucket.

S3 Path:

s3://optimizely-rex/{accountId}/results/yyyy/mm/dd/{experimentId}/{fileName}

Legend:

Example:

optimizely-rex/123456789/results/2018/11/30/456789123/results/record-r-00001.snappy.parquet

Notes:

s3://optimizely-rex/{account_id}/results-holdback/yyyy/mm/dd/{experimentId}/{fileName}

Status File

The daily partition files are ready for import when the _SUCCESS file is available at either:

Optimizely recommends that you check for the presence of this file at regular intervals. If it is present, it is safe to start importing the daily partition data.

Schema Field Descriptions

Each row in the export is a results record that stores session-aggregated conversion data. The table provides descriptions for the fields in the Results Export schema.

Retrieve identifiers

To retrieve the campaignId, experimentId and variationId:

To retrieve the eventId: The event id can be found in the Manage Events dialog in the Optimizely application, or you can programmatically retrieve it via the REST API.

Field name Type Description
id

string

Unique record identifier.

timestamp

long

The record timestamp represented as the number of milliseconds since Unix epoch. It roughly corresponds to the client timestamp of the first event in the record.

accountId

long

Unique account identifier.

projectId

long

Unique project identifier.

visitorId

string

The visitor identifier. Web Experimentation and Personalization: The optimizelyEndUserId value stored in the Optimizely cookie. Full Stack: The user ID provided by your app or service.

sessionId

int

Unique session identifier. Optimizely generates this identifier during the event sessionization process.

campaignId

string

The unique campaign identifier. For Web and Full Stack A/B experimentation, the campaignId is always equal to experimentId. See also Retrieve identifiers.

experimentId

long

Unique experiment identifier. See also Retrieve identifiers.

variationId

string

Unique variation identifier. See also Retrieve identifiers.

eventId

long

Unique event identifier; is the event entity Id in some Optimizely X products. See also Retrieve identifiers. By convention, Impression events have event_id equal to experiment_id. Filter by experiment_id to calculate total impression events for the results set. Overall Revenue events have event_id equal to -1. Filter by -1 to calculate Overall Revenue for the results set.

count

int

The count of conversions in the record.

revenue

long

The total revenue summed across all events in the record.

value

float

The total value summed across all events in the record; used for total value and other numeric metrics.

segments

array<id(long):value(string)>

An array of segments in the form of segment_id:value. Note: The array contains default or custom attributes attributed to the visitor at that session, enabling you to segment your results by one or more of these attributes. See Segment your results in Optimizely X Web and Custom Attributes: Capture visitor data through the API in Optimizely X.

receivedTimestamp

long

The timestamp of when the record was created represented as the number of milliseconds since Unix epoch. It approximates the time Optimizely received the first event in the record.

Usage Notes

Important!: The Results Export exports a daily copy of the results data. Any changes made to the Results page after a daily export has taken place will be reflected on the Results page retroactively, but will not be included in the exported data. Examples of this are:

When using the Results Export to analyze cross-product results, be aware of these differences:

See the examples in SQL Examples for Common Metrics Calculations. For more information on how Optimizely counts conversions and understanding the data pipeline, see the KB articles noted in Resources.

See also our Retention Policy.

Sample File

You can download a sample .tsv file from this link: Results data export file.

SQL Examples for Common Metrics Calculations

The section provides examples in SQL for some common metrics you might want to calculate for your A/B Experiments or Personalization Campaigns. The examples assume the records have been imported to a SQL table called table.

Note: The first example in each section shows how to calculate impression usage.

A/B Experiments

Query Impressions by variation for experimentId 11818160790 and eventId 11351730352:

Note: By convention, impression events have eventId equal to experimentId.

SELECT SUM(count), variationId FROM table
WHERE experimentId = 11818160790 and timestamp between 1554152247000 and 1556657847000
AND eventId = experimentId
GROUP BY variationId

Query Unique Visitors by variation for experimentId 11818160790:

SELECT COUNT(distinct visitorId), variationId FROM table
WHERE experimentId = 11818160790 and experimentId=eventId and timestamp between 1555599262169 and 1555743600000
GROUP BY variationId

Query Unique Conversions by variation for experimentId 11818160790 and eventId 11351730352:

SELECT COUNT(distinct visitorId), variationId FROM table
WHERE experimentId = 11818160790 and timestamp between 1554152247000 and 1556657847000
AND eventId = 11351730352
GROUP BY variationId

Query Total Conversions by variation for experimentId 11818160790 and eventId 11351730352:

SELECT SUM(count), variationId FROM table
WHERE experimentId = 11818160790 and timestamp between 1554152247000 and 1556657847000
AND eventId = 11351730352
GROUP BY variationId

Query Total Revenue by variation for experimentId 11818160790 and eventId 11351730352:

SELECT SUM(revenue), variationId FROM table
WHERE experimentId = 11818160790 and timestamp between 1554152247000 and 1556657847000
AND eventId = 11351730352
GROUP BY variationId

Query Total (Numeric) Value by variation for experimentId 11818160790 and eventId 11351730352:

SELECT SUM(value), variationId FROM table
WHERE experimentId = 11818160790 and timestamp between 1554152247000 and 1556657847000
AND eventId = 11351730352
GROUP BY variationId

Personalization Campaigns

Query Impressions by Personalization experience for campaignId 11818160790 and eventId 11351730352:

Note: By convention, impression events have eventId equal to experimentId.

SELECT SUM(count), experimentId FROM table
WHERE campaignId = 11818160790 and timestamp between 1554152247000 and 1556657847000
AND eventId = experimentId
GROUP BY experimentId

Query Unique Sessions by Personalization experience for campaignId 11818160790 within a specified time frame:

SELECT COUNT(distinct sessionId), experimentId FROM table
WHERE campaignId = 11818160790 and experimentId=eventId and timestamp between 1554152247000 and 1556657847000
GROUP BY experimentId

Query Unique Conversions by Personalization experience for campaignId 11818160790 and eventId 11351730352:

SELECT COUNT(distinct sessionId), experimentId FROM table
WHERE campaignId = 11818160790 and timestamp between 1554152247000 and 1556657847000
AND eventId = 11351730352
GROUP BY experimentId

Query Total Conversions by Personalization experience for campaignId 11818160790 and eventId 11351730352:

SELECT SUM(count), experimentId FROM table
WHERE campaignId = 11818160790 and timestamp between 1554152247000 and 1556657847000
AND eventId = 11351730352
GROUP BY experimentId

Query Total Revenue by Personalization experience for campaignId 11818160790 and eventId 11351730352:

SELECT SUM(revenue), experimentId FROM table
WHERE campaignId = 11818160790 and timestamp between 1554152247000 and 1556657847000
AND eventId = 11351730352
GROUP BY experimentId

Query Total (Numeric) Value by Personalization experience for campaignId 11818160790 and eventId 11351730352:

SELECT SUM(value), experimentId FROM table
WHERE campaignId = 11818160790 and timestamp between 1554152247000 and 1556657847000
AND eventId = 11351730352
GROUP BY experimentId

Raw Events Export

The Raw Events Export contains the raw list of the events Optimizely received, separated by experiment ID. It is not a list of raw results data. Event data goes through an attribution process as described in How Optimizely counts conversions.

Important! Because Raw Events Export data is pre-attribution and unprocessed, recreating the Results page numbers out of Raw Events Export takes significant effort: it requires recreating our attribution model in your queries. Optimizely doesn't provide support in matching Raw Events Export data with results. To confirm or recalculate results, we strongly recommend using the Results Export instead. The Results Export is an exact copy of the data that Optimizely uses to compute the Results page. For more information, including appropriate use cases for each export, see the Export Comparison Table.

Get Started

See Access Optimizely Export Data via Amazon S3.

Important! The Raw Events Export files represent a raw output of all events that occur in the browser. We recommend that customers import all records. Each and every event the visitor initiates on your site or app is captured. For example, this would include five records if the visitor landed on your homepage five times. Optimizely doesn't reproduce rows in the same or subsequent files unless that event actually happened again.

Technical Details

The Raw Events Export process generates multiple files containing raw event data collected in the past 24 hours (00:00 - 23:59 UTC). The files are tab-delimited and compressed with gzip for faster download. The first file in each daily partition includes a header row.

AWS S3

This section describes the Raw Events Export files that you’ll retrieve from your Optimizely AWS S3 bucket. See the AWS documentation for more information about making requests to Amazon S3.

The S3 file location follows the format:

s3://optimizely-export-ng/{accountId}/{projectId}/2.0/yyyy/mm/dd/{experimentId}/{fileName}

Legend:

Example: 987654321-0-2017-03-06-r-00062.gz

Notes:

Status File

The daily partition files are ready for import when the status.yaml file is available at

s3://optimizely-export-ng/{accountId}/{projectId}/2.0/yyyy/mm/dd/status.yml

A status file (status.yaml) is included within each daily partition to track the success or failure of the Raw Events Export job. The status files contain the following information:

View a sample YAML file with and without failed export files.

Export Process Notes

Schema Field Descriptions

As these files are TSVs, nulls will be empty tabs.

Field Type Description
timestamp

positive integer

The timestamp of when the event occurred in the browser or app. The format is a number representing the number of seconds since Unix epoch.

project_id

integer

Your Optimizely project ID on which the campaign and/or experiment lives.

campaign_id

integer

The campaign ID (also known as layer ID). Web Experimentation and Personalization: Value can be found in the API Names tab. Full Stack: Value can be found in the project's JSON data file.

experiment_id

integer

The experiment ID. Web Experimentation and Personalization: Value can be found in the API Names tab. Full Stack: Value can be found in the project's JSON data file.

variation_id

integer

The ID Optimizely uses to identify the variation the visitor saw. Web Experimentation and Personalization: Value can be found in the API Names tab. Full Stack: Value can be found in the project's JSON data file.

layer_holdback

string

Boolean value that indicates whether the visitor was placed in the campaign's or experiment's holdback group. Values are either true or false.

audience_names

alphanumeric string, representing an array

An array containing the name of the audience for which the visitor qualified to be placed in the campaign and experiment. Web Experimentation and Personalization: if your snippet masks descriptive names, this will be the audience ID (of the form [Aud 1234567890]). It can be mapped to Audience Name on the Campaign Overview screen, API Names tab.Full Stack: This mapping is available in the project's JSON data file.

end_user_id

alphanumeric string

Alphanumeric string concatenated with a Unix timestamp. Example: oeu1460584472759r0.9885484367665214 Web Experimentation and Personalization: This is the anonymous optimizelyEndUserId value stored in a cookie and local storage. It represents a unique visitor. Full Stack: This is the user ID provided by your app.

uuid

tring

Ignore; null. uuid is not currently supported in Optimizely X.

session_id

alphanumeric string

A unique session identifier. Web Experimentation and Personalization: Set to AUTO by default. Full Stack: This is null and can be ignored.

snippet_revision

integer

Web Experimentation and Personalization: The revision number of the Optimizely snippet that was served in this visitor's browser. Full Stack: The revision number of your datafile that was compiled into the SDK at the time of event firing.

user_ip

IPv6 address format

IP address of the visitor associated with this tracking call. If you employ IP Anonymization, the last octet will be a 0 (zero) for all tracking calls made to Optimizely. The full IP address will not be stored anywhere and cannot be retrieved later.

user_agent

alphanumeric string

Web Experimentation and Personalization: The userAgent header passed from the browser. Full Stack: The package or code language that initiated this tracking call.

user_engine

alpha string

Language or stack in which the Optimizely snippet or SDK was served. For example, a value of js will be shown for the web snippet.

referer

alphanumeric URL

Web Experimentation and Personalization: The referring URL in the browser. Full Stack: This will be null and can be ignored.

global_holdback

alpha string

Ignore; will always be false. A global holdback is not currently supported in Optimizely X.

event_type

alpha string

Web Experimentation and Personalization: The type of event recorded by Optimizely. Values are view_activated or other. view_activated indicates the activation of a page (view), and other could be a click or custom event. Refer to the event_name column for more details. Full Stack: This will be null and can be ignored. For all products, if the row represents a bucketing decision event, this field will be null.

event_name

alphanumeric string or integer

The API name of the click or custom event. event_name is an alphanumeric string if event_type is other, or will be an integer if event_type is view_activated. Web Experimentation and Personalization: If event_type equals view_activated, this value will be the page ID. All products: If the row represents a bucketing decision event, this field will be null.

user_features

large alphanumeric string, representing an array of JSON objects

Web Experimentation and Personalization: An array of JSON objects of Optimizely customer-defined behavioral attributes (if Personalization is enabled), custom dimensions and/or user attributes, and Optimizely standard segments. Each object will have a type, a name, and a value. These values are all optional. Full Stack: This will be an array of JSON objects containing customer-defined attributes. Optimizely default segments include: first_session; browser_id; AdWords campaign value: if source_type is campaign; device; source_type: traffic source; timestamp: In seconds since Unix epoch; offset: Number of minutes behind UTC, serves as an indicator of timezone in which the event was fired.

active_views

string

Deprecated. For all products, this field is null.

event_features

large alphanumeric string, representing an array of JSON objects

Web Experimentation and Personalization: An array of JSON objects of any page or event tags or categories defined for this event. Full Stack: An array of JSON objects containing customer-defined tags. For all products, if the row represents a bucketing decision event, this field will be null.

event_metrics

alphanumeric string, representing an array of JSON objects

If revenue is captured for this event, a JSON object array indicating revenue as the name and the value in cents. Note: revenue will never be a decimal. Optimizely preserves the converted value into cents as an integer. For all products, if the row represents a bucketing decision event, this field will be null.

event_uuid

alphanumeric string

A unique identifier for this event. Clients usually set this value with any UUID-generating method. The field can be used to de-duplicate events that are accidentally or erroneously replayed.

Usage Notes

For Optimizely X Web Experimentation and Personalization Data Export files:

See also Retention Policy, Encryption, and Resources.

Sample Files

You can download sample .tsv files from these links:

Troubleshooting

I don't see a status.yaml or _SUCCESS file.

If you do not see a status.yaml or _SUCCESS file, the raw data is still copying into the S3 bucket. When the status.yaml and _SUCCESS file appear, the process is complete and it is safe to begin copying the raw data.

The status.yaml or _SUCCESS file says export failed.

Please file a support ticket so we can look into this issue.

I need a script or code for accessing my data.

We do not recommend specific programs for accessing raw data through Amazon S3 or provide scripts for data retrieval. You may need technical help on your end to develop a solution that meets your business needs.

My raw data doesn't match the results page.

It's important to understand that the raw event data is not raw results data. The data we provide access to is a raw list of events we received — it is not a list of raw results data. Event data goes through an attribution process described in our knowledge base here. Our raw data comes from before attribution occurs.

Raw event data will contain events from users who may or may not count for an experiment. Events may exist in exports outside of the time frame in which an experiment ran. Re-creating the results page's numbers out of raw data will be non-trivial as it would require re-creating our entire attribution model in your queries.

The password column is missing in the credentials file I received from technical support.

We do not provide access to the AWS web console which is why there is nothing in the password column. All you need is the AWS_ACCESS_KEY_ID and the AWS_SECRET_ACCESS_KEY to access your data.