When developing a product, analytics often get left behind as a separate, non-developer concern, much like operations used to be. And much like it, perhaps it is time it gain first class status as part of regular development. After all, if the product fails, there's nothing to develop! Let's have a look at what analytics is all about, and how developers can start thinking about it now.
Let's start with some definitions.
It really is not rocket science. Think of it as a data table where every dimension is a column, every event is a row, and metrics are values within the cell. The table is sparse, as not every cell is populated. Rows are also not evenly populated, different rows can have different subsets of columns populated.
Here's an example table with some rudimentary dimensions (columns). We see that some dimensions are scope related, while others are natural dimensions of the application.
scope dimensions | natural dimensions | |||||
---|---|---|---|---|---|---|
user id | session id | request id | location | age | quantity | price |
… | … | … | … | … | … | … |
In addition to natural dimensions, we often have business dimensions: for example, the number of enquiries made, products purchased, products returned, etc.
These business-specific dimensions should be defined early to drive the development. We can't build and optimise what we're not measuring. Consider the inverse strategy, which is to rely on "store-everything-and-analyse-later". Unfortunately, this strategy is leaned on far too often.
Up to this point, we have 3 classes of dimensions: scope, natural and business, on which we can perform some analytics.
We can perform basic statistics like mean, standard deviation, etc in one dimension; answering questions like "what's the average purchase price?". We can also group the statistics by another dimension; answering questions like "what's the average purchase price by gender?". When combined with business dimensions, we can answer questions like "what's the demographic (combination of location, age, gender) that is most likely to buy higher margin products?"
Mathematically, these questions are really about identifying the shape of event clusters in our N-dimensional space. The shape (or lack of) informs us which dimensions are correlated (or not) with which.
There is a 4th class of dimensions, which I'll call the control dimensions. These are additional properties that shape the elasticity of demand. These are features that make the application more appealing to users, for example, branding, messaging, call-to-action triggers, and so on.
These control dimensions are usually not measured unless a test window is currently open . Once success is proven, it is usually merged into the application and the dimension eliminated.
When implemented correctly, these can be the most valuable dimensions in tuning your application for success.
Not all of the 4 classes of dimensions described: scope, natural, business, and control dimensions need to be stored in its final form directly. Indeed, it can be more useful to store them in raw or intermediate forms, then transform them to its final form during analysis.
For example, we can store a customers birth date, but we may choose to analyse their age at time of purchase instead. By flexibly storing their birth dates, we leave open the possibility to calculate their current age. If we had stored their age at time of purchase instead, we'd have lost that flexibility.
Use your judgement though, as you can easily fall into the "store-everything-and-analyse-later" trap with this approach.
If this sounds familiar, it is probably because this a common and solved problem. There already exists a range of business intelligence and analytics tools to perform the collection and analysis of data. On the backend, there's tools like Splunk , Sumo Logic , and more. On the frontend, there's tools like Google Analytics , Adobe Analytics , and more.
Or, indeed you can roll your own! All you need is a database or a spreadsheet to store the N-dimensional data, and you can use whatever tools necessary to extract the shape of the event clusters. Indeed, this is what the aforementioned tools do, underneath the shiny user-friendly interface.
Analytics need not be a black box that get relegated to the marketing team. It is integral to application development. I hope that by casting the problem into a mathematical one, it becomes obvious what needs to be collected and how, and subsequently, what needs to be analysed and how.