LUXSCI API: REPORTING BASICS

Report Querying and Results: Commonalities

All API queries for “Reports” are HTTP GETs with optional query strings that allow you to narrow down the scope of the request and/or select which of the many matches to actually return. The following table lists the optional query string parameters that you can use with any report request. The meaning of “date” referred to in the date restrictions will depend on what API call is being executed.

Date and Row Range Selection

Keyword	Date Type	Description
`after_date`	DATETIME	Return only matches dated after this date time. Format “YYYY-MM-DD HH:MM:SS” in GMT.
`before_date`	DATETIME	Return only matches dated before this date time. Return only matches dated after this date time.
`first_row`	Integer	(Default value 0). All of the matches are sorted and numbered starting from 0. E.g. if you have 100,000 matches, these are rows 0 through 99,999. If you specify a first_row, then the returned matches will be those rows starting from that row. E.g. without a specification (or if you specify 0), you might get rows 0 through 49,999. In your next request you might specify `first_row=50000` to get rows 50000 to 99,999.
`from_date`	DATETIME	Return only matches dated on or after this date time. Format “YYYY-MM-DD HH:MM:SS” in GMT.
`matches`	Integer	(Default value 50000). You can set matches to any value from 0 to 50000. This sets the maximum number of matches that will be returned in response to your request (starting from `first_row`, if specified). Note that if the response size starts to exceed 50MB, then the response can still include fewer than your “matches” number, to keep the response size reasonable. If you specify “0” here, then no matches will be returned; however, you will still get the information on the number of matches that there were. For asynchronous reports, i.e., where a `destination` is specified, the default value of `matches` is still 50000; however, there is no maximum value. If you want to initiate a query where you expect a large number of rows of results, for example 5 million, you must specify a value of `matches` that is as large or larger than your expected result set size, or else your result set will be truncated. This behavior is designed to help prevent the accidental creation of long an expensive report requests. Note also that for asynchronous reports, the trick of setting `matches` to `0` to get only the size of the result set is not supported. To get that only information, do not use an asynchronous report request.
`through_date`	DATETIME	Return only matches dated before or on this date time. Format “YYYY-MM-DD HH:MM:SS” in GMT.

Additional query string parameters, specific to specific reports, are described in the documentation for those API endpoints.

The response to a successful Report request will be a JSON object with a data section that is an associated array. Among other endpoint-specific data, it will contain a number of standard keyword data fields:

Response Data

The following fields will be found in the response for reporting requests that are not "asynchronous."

Keyword	Date Type	Description
`first`	Integer	The index of the first row that was returned. E.g. this should match the “first_row” parameter specified in the request.
`header`	Array	Array that includes the column headings for all of the matches returned. See API Report Responses for definitions of these column heading names.
`matches_returned`	Integer	The number of matching rows returned in the “rows” array.
`matches_total`	Integer	The number of rows matching your request. Note that this can be larger than the number of rows returned.
`rows`	Array	Array of data for all matches returned. This array contains one element for each returned match. Each element is itself an array. Each field in the match arrays corresponds to the respective field in the “header” array. E.g. if the 1st element in the “header” array is “user_id”, then the 1st element in a match data array will be the integer user ID number, etc.

Note: A report query could match zero to millions of rows, depending on the report and the context. Returning millions of rows in response to a single request is not feasible as the download would be absolutely huge. Report result sets are limited to at most 50,000 matches or about 50MB of data (whichever occurs first). If your query matches more than that, then only a subset of matches is returned. We recommend using the query parameters and the result data to page through your results as needed. We also recommend using the data restriction options to make your requests as specific as possible – this will make them faster and allow them to return smaller result sets. You can also use "asynchronous report requests," described below, to create larger reports.

Summary Report Requests

Some reports are so-called "summary reports" or "count reports" or "number of" reports. Instead of returning all matching rows, these return only a single row with columns that provide summary statistics such as "total number of" or "percentage of" etc.

For these API report requests, you can optionally supply up to three "filters" by passing parameters named "filter1", "filter2", and/or "filter3". When filters are provided, the API call applies each one to the summary results. If one of the supplied filters does not match the summary results, then no summary results are returned. This is useful in particular for WebHook Automated Reports, as if there are "no results", then no report is sent to WebHook URL or email. So, by using filters, you can cause Webhooks/emails for summary reports to be sent only under certain conditions. For example, one could create a Webhook "alert" that will fire only if the percentage of email delivery failures over the past hour is above 5% and the number of messages sent in the last hour is at least 1000.

Filters work as follows:

The value of each of the three filter parameters is an expression of the form: "field_name CONDITION value".
field_name must be the header name for one of the columns of data returned in the summary report. E.g., this could be "pct_failed" for the "Percentage of failed emails" summary column in the email delivery summary reports.

CONDITION must be of the following symbols or symbol pairs that governs how the specified field_name will be compared to value

Symbol	Example Use	Description
`<`	`field_name < value`	Less than
`<=`	`field_name <= value`	Less than or equals
`=`	`field_name = value`	Equals
`>=`	`field_name >= value`	Greater than or equals
`>`	`field_name > value`	Greater than

value must be the numerical value to which the value for field_name is compared.

Example

To create a filter that matches only "if the percentage of email delivery failures over the past hour is above 5% and the number of messages sent in the last hour is at least 1000", we would pass two filter parameters:

filter1 = pct_failed>0.05

filter2 = recipients>=1000

Asynchronous Report Requests

If you need to make reports that will return more than 50,000 rows, you can either perform many successive requests using the first_row parameter, as described above, or you can request an "asynchronous report." With an asynchronous report:

You make an API request for the report, indicating that it is asynchronous by specifying a destination parameter, described below, and setting matches to the upper bound of the number of rows to be returned.
LuxSci starts running that report in the background and immediately gives you a job number, job_id, and the name of a file where the report CSV will be saved.
You can use API calls to query the status of the job_id so that your application can known when it is complete (or failed).
When the job is complete, the CSV file containing the report data is uploaded to your specified destination location.
You can access the report CSV file from that destination.

Some general technical guidelines when using asynchronous report requests:

You must explicitly approve the result set size. The report will only return matches rows (which defaults to 50000), even if the potential number of matching rows is larger. For example, if your database has 10,000,000 email delivery records and you make an asynchronous report for all of them, only 50,000 rows will be returned unless you explicitly set the matches parameter to at least 10000000 in the request. This behavior is a guardrail against accidentally generating very large and long running report requests that excessively degrade your service performance.
The number of concurrent asynchronous reports is limited. By default, a customer account can only issue a limited number of asynchronous reports that will running concurrently (the default limit is 2). As each report can have an impact on your database performance, running many at once could significantly degrade your services. This limit is a guardrail and it can be increased for customers with enough database resources to support higher numbers of large concurrent reports.

Asynchronous Report Parameters

When making an API report query, the following additional parameters can be used to make the report request "asynchronous" and to specify where the resulting result set CSV file should be uploaded.

Keyword	Date Type	Description
`destination`	String	How the resulting CSV file should be uploaded to the `destination_hostname`. The only supported value at this time is "`SFTP`" which is an SFTP upload over the SSH protocol on port 22. If you specify a "`destination`" value in your API request, you enable "asynchronous report mode". Note, there is a 15-minute upload time limit for SFTP; this would only be an issue for very large reports being uploaded to very slow/busy SFTP servers.
`destination_directory`	String	Optional. If supplied, the SFTP directory is changed to this directory once authentication to the SFTP server is complete. The results fill will then be uploaded to this directory. This directory must exist and the `destination_username` user must have write permission to it for the upload to be successful.
`destination_hostname`	String	The server name to which your results file should be uploaded.
`destination_password`	String	The password to be used to authenticate (via SFTP) to `destination_hostname`.
`destination_username`	String	The username to be used to authenticate (via SFTP) to `destination_hostname`.

Asynchronous Report Responses

When you make an asynchronous API report request, the response will be different from a normal API report request (which is described above). This is because the API will return immediately and then start working on the report in the background. The following are returned in response to an asynchronous report request:

Keyword	Date Type	Description
`job_id`	Integer	Unique numerical ID number for the background job that is asynchronously creating your report. This number can be used with a "job" API call to query the status of the report creation (see below).
`filename`	String	The name of the file that will be uploaded to your destination when the job successfully completes.
`current_jobs`	Integer	The number of currently running asynchronous report requests for your account as a whole, including this one which you just submitted. In combination with `max_jobs`, this tells you how many more jobs you can submit before being required to wait for already-submitted jobs to first complete.
`max_jobs`	Integer	The maximum number of concurrently running asynchronous report requests permitted for your account as a whole. An asynchronous report request that would create more than this number of concurrent processes will be rejected.

Job Status Requests

The asynchronous report request product background "jobs" each with a "job_id". The "jobs" user- and account-level API calls can be used to check the status of these jobs. The "/job/job_id" call returns the status of "job_id"; the "/jobs" call returns the status of all jobs submitted in the last 7 days. See: Asynchronous Job Status Report Responses for the data returned by each of these requests.

Reporting Basics