Author: Danielle Mosimann

Make or Buy? Organic growth or M&A?

A tale of two value creation opportunities 

On the 29th December 2016 and again the 7th February 2017 the Financial Times wrote about an M&A boom. “The M&A boom will carry on…Many companies face poor organic growth prospects, forcing them to consider buying rivals or expanding in new territories…” Deloitte reports that 75% of executives expect deals to increase in 2017 while according to Moody’s “A ‘major theme’ of recent activity was positioning for the future through the acquisition of technology.”

Does this strong appetite for acquisitions create economic value?

Depending on the industry and the profile of the acquisition target, bigger may well be better. But if both the acquirer and the acquired were struggling to grow before the acquisition, are we not simply moving the problem to the future?

Acquisitions instantly increase revenues and usually earnings per share. From this perspective, deals that expand a business’ geographic footprint and improve the competitive position can be an attractive approach, especially in mature markets. Focus on managing the integration of both companies, improving economic performance and cost efficiency and leveraging greater market share will certainly create some economic value. However, after a few years these effects taper off and executives will need to decide what’s next.

Furthermore, the current high share prices or, perhaps better said, the high market valuations put pressure on executives to quickly deliver synergies. This is usually shorthand for cutting costs. According to a Bain study from late 2014, 70% of companies announce synergies that are higher than the scale curve suggests. Unsurprisingly, most companies will be disappointed by the actual synergies created.

Against that background, organic growth can be an attractive alternative. Executives often underestimate the power of organic growth. Clearly, organic growth takes more effort and time for growth to manifest itself, but our research shows that organic growth typically generates up to one third more economic value.

This is hardly surprising as the upfront investment for organic growth is lower, while for acquisitions the acquisition price usually includes a takeover premium. Therefore, over time the ROIC and ROE is higher for organic growth compared to acquisitions.

This is a good reason to look hard at creating internal growth opportunities and leave the acquisitions to competitors that have run out of ideas.

Karel Leeflang and Roland Mosimann, StrategyPod powered by AlignAlytics

Posted on March 10, 2017 by Danielle Mosimann

Segmenting your way to pricing profits!

Segmenting your way to pricing profits! Color wheel by various type of fruits and vegetables from top view[/caption]For many, segmentation is the single critical factor that can drive a differentiated pricing agenda and therefore an accelerated route to profit growth. While this truth may be obvious to most, why is segmentation so difficult to implement in practice? Time and time again the gap between the theory of using segmentation for pricing excellence and the implemented reality seem to be very wide in many businesses.


So why is there such a gap between the theory and reality of good segmentation? A number of practical hurdles present themselves when it comes to segmentation:



  • There is the Data Segmentation Mountain to climb which is hard work!
  • There is segmentation, segmentation and segmentation!What level of sophistication are you able and willing to implement and how does it relate to perceived customer value?
  • The organisational and functional bias of the business will influence the segmentation. Is the segmentation drive more finance, sales or marketing led and how are these interconnecting influences integrated?
  • Segmentation “velocity” or its propensity to change is a dynamic that is often underestimated. If the segmentation cannot be kept up to date the whole framework for differentiated pricing quickly deteriorates.

1. The Data Segmentation Mountain


Data Segmentation Mountain The issue here is the complexity and size of data that needs to be segmented. Specifically this is a function of your channels or route to market, number of product items, your existing and prospect customers and then your market segments. Even with a limited number of product items, customers and channels the picture quickly gets complex e.g. 100 items x 100 customers x 4 channels = 40,000 elements to be segmented.

This then can be further complicated if the business works across more than one ERP system and where there needs to be a product and customer alignment between these systems. Master database management is a big theme within IT to ensure that customer records or product hierarchies are correctly mapped across various ERP systems. However, frequently these mappings do not include a more commercial and market specific segmentation. A market led segmentation would make sense to incorporate at the same time but unfortunately the organisational silos of your typical business seem often to hinder such an outcome.

Techniques for segmenting and categorising your data mountain exist and typically require a blending of business, data, analytical and IT capabilities. This blend allows for a systematic approach that can then process the segments even where the complexity and data load is daunting.


2. Segmentation vs segmentation vs segmentation


Segmentation will vary in sophistication partially because it is difficult to implement but also because business models work differently across industries and markets.

Segmentation & Ability to Price

The more differentiated a business’s customers can be segmented into value categories, the more a business can extract that value through differentiated pricing. Since value is related to the whole business proposition often involving intangibles (e.g. Service and Relationship), it is in fact the perceived customer value that counts. But this value segmentation needs to take account of the competitive or alternative options a customer is being offered before determining whether the value segmentation is real.

The mechanism to define value segments can take multiple forms depending on the industry and the ability to identify value drivers. Some of these may be behavioural (e.g. consumer behaviour for different occasions), others are linked to your contribution to a customer’s own value proposition, and then there are various market conditions that can also impact the perception of value.

So at its simplest form segmentation is typically about customer account size where a business will give better pricing terms for a large customer than a small customer. A more sophisticated approach will possibly look more strategically as to a customer potential and whether a market segment is attractive or not. At the most sophisticated extreme a business has systematically worked out a value based pricing segmentation where they fully understand their value to their customers and can defend their position against competitive alternatives.


3. Organisation or Functional Bias


Organisation or Functional Bias Are the organisational influencers of pricing coming more from Finance, Sales or Marketing? Typically the pricing agenda and therefore any implemented segmentation will be impacted. From a Financial perspective I will be looking, for example, at my gross margin and how the price relates to my costs. I may also be pushing for increased prices to improve margin but without necessarily fully understanding the market context and pressures. With the Sales function there is a tendency to look for aggressive pricing to close deals and often this pushes the organisation towards a customer account size segmentation. Ideally the Marketing function is able to look beyond the more short term influences that can come from Sales and Finance and drive through a segmentation that takes account customer and market value differentiators as well the product life cycle.


4. Segmentation Velocity


Segmentation Velocity

In practice making the segmentation a manual exercise is likely to end in failure. The best approach is to frame the exercise by a logical hierarchy from which various algorithms can then be used to drive the detail of the segmentation. One way is to be market and customer led while for others it may be a product based segmentation. So yes part of the work is manual but only at a high level and then the detail gets allocated based on a number of business rules. Ultimately a rules based segmentation is the best way to keep the segmentation framework up to date and relevant.



Patrick Mosimann

Author: Patrick Mosimann


Posted on February 18, 2017 by Danielle Mosimann

Our Partners

Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo

Posted on December 8, 2016 by Danielle Mosimann

Our Partners

Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo

Posted on December 8, 2016 by Danielle Mosimann

Our Partners

Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo

Posted on December 8, 2016 by Danielle Mosimann

Blog Post Title

Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo

Posted on December 6, 2016 by Danielle Mosimann

Blog Post Title

Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem periam, eaque ipsa quae ab illo

Posted on December 6, 2016 by Danielle Mosimann

Blog Post Title

Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo

Posted on December 6, 2016 by Danielle Mosimann

Blog Post Title

Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo

Posted on December 6, 2016 by Danielle Mosimann

Insights Fit For Action

Tailored analytic solutions to empower better and faster decisions

Posted on December 6, 2016 by Danielle Mosimann

Technology & Data

A catalyst for innovation and growth in a fast moving world

Posted on December 6, 2016 by Danielle Mosimann

Strategic & Analytical Experience

A uniquely fused capability of strategy consulting,
analytical problem-solving, technology and data expertise

Posted on December 6, 2016 by Danielle Mosimann

Using Simulation to Determine Sample Sizes for a Study of Store Sales

Suppose a client wants to estimate the total sales value of widgets in a large number of stores. To do this, they will survey a sample of that population of stores. You need to provide the client with advice on choosing a suitable sample size.

Unfortunately, the client has little information to help you. They know that there are 1,000 stores that sell widgets. But they have no idea what the average store sales might be. All they know from previous studies is that the sales tend to be very right skew: Most stores sell very few widgets and very few stores sell a lot of widgets.

This is a fairly typical situation. We deal with a lot of sales data at AlignAlytics and typically find sales volumes and values (allowing for sale price to vary between sellers) to be very right-skew. Sales volumes are often well described by a Poisson distribution; A Pareto or a chi-square distribution often works well for sales values.

So, let’s suppose the client tells us that they expect the sales value per store to be distributed something like this:

Histogram of Sales

That looks very much like a chi-square distribution with two degrees of freedom. So we run the following R code:

# Create the distribution function.
r_dist_fn <- function(n) rchisq(n, 2)

# Get the dataframe of confidence intervals.
df_ci <- estimate_ci_from_sample(r_dist_fn, pop_size=c(1000), min_sample=10, max_sample=50, n_simulations=100000, confidence=c(50, 80, 90, 95))

That gives us a dataframe with rows that look like this:

Confidence Intervals Dataframe

This tells us, for instance, that if we use a sample of 20 stores, there is a 90% chance that the total sales in the population of stores is between approximately 72% and 150% of the estimate based on the sample.

Here's a bit of code that graphs that dataframe of confidence intervals:

for (pop in sort(unique(df_ci$population_size))){
   # Subset df_ci by population size and get the confidence intervals calculated for that subset.
   df_ci_sub <- df_ci[df_ci$population_size==pop,]
   confidence_interval <- sort(unique(df_ci_sub$confidence))

   # Create an empty plot of the required size.
   plot(x=c(min(df_ci_sub$sample_size), max(df_ci_sub$sample_size)), 
        y=c(min(df_ci_sub$pop_total_div_estimated_total_ci_lower), max(df_ci_sub$pop_total_div_estimated_total_ci_upper)), 
        main=paste0("Confidence Intervals (", paste(confidence_interval, collapse="%, "), "%) for Population Total / Sample Total (Population: ", pop, ")"),
        type='n', xlab="Sample Size", ylab="Population Total / Sample Total")
   # Loop across the confidence intervals.
   for (ci in confidence_interval){

      # Graph a confidence interval.
      df_ci_sub_sub <- df_ci_sub[df_ci_sub$confidence==ci,]   
      polygon(c(df_ci_sub_sub$sample_size,                            rev(df_ci_sub_sub$sample_size)), 
              c(df_ci_sub_sub$pop_total_div_estimated_total_ci_lower, rev(df_ci_sub_sub$pop_total_div_estimated_total_ci_upper)), 
              col=rgb(0, 0, 1, 0.2), border=TRUE)

   # Draw a horizontal line at y=1.
   lines(y=c(1, 1), x=c(min(df_ci_sub_sub$sample_size), max(df_ci_sub_sub$sample_size)))   

And here's the output:

Confidence bands for Population Total / Sample Total

In the above graph, the widest confidence interval is the 95% interval, the thinnest (closest to the horizontal line at y=1) is the 50% confidence interval.

So, as before, there is a 90% chance that the total sales in the population of stores is between approximately 72% and 150% of the estimate based on the sample:

Confidence bands for Population Total / Sample Total with lines showing 90% interval for a 20 sample size

Using the above graph and the dataframe of confidence intervals, the client should be able to choose a sensible sample size. This will involve balancing the cost of increasing the sample size against the accuracy improvement achieved by doing so.

Finally, here's the estimate_ci_from_sample function used above:

estimate_ci_from_sample <- function(r_dist_fn, pop_size, min_sample, max_sample, n_simulations=1000, confidence=c(50, 80, 90, 95)){
   # Returns a dataframe of confidence intervals for the sum of a population of real numbers (values given by r_dist_fn) divided by
   # the sum of a sample from that population.
   # r_dist_fn:     A function taking one parameter, n, and returning n random samples from a distribution.
   # pop_size:      A vector of population sizes, e.g. c(100, 1000, 2000).
   # min_sample:    If min_sample is in (0, 1) the minimum sample size is a fraction of the population size. If min_sample is a
   #                positive integer, the minimum sample size is a fixed number (= min_sample).
   # max_sample:    If max_sample is in (0, 1) the maximum sample size is a fraction of the population size. If max_sample is a
   #                positive integer, the maximum sample size is a fixed number (= max_sample).
   # confidence:    A vector of the required confidence intervals, e.g. c(50, 80, 90, 95).
   # n_simulations: The number of simulations to run per population size + sample size combination. The higher this is, the more
   #                accurate the results but the slower the calculation. 
   # Useful functions.
   is_int <- function(x) x %% 1 == 0
   sample_int <- function(spl, pop_size) ifelse(is_int(spl), min(spl, pop_size), round(spl * pop_size))
   # Check the min_sample and max_sample parameters.
   if (min_sample <= 0 || (min_sample > 1 && !is_int(min_sample))) stop("min_sample must be in (0, 1) or be an integer in [1, inf).")
   if (max_sample <= 0 || (max_sample > 1 && !is_int(max_sample))) stop("max_sample must be in (0, 1) or be an integer in [1, inf).")
   if (is_int(min_sample) == is_int(max_sample) && max_sample < min_sample) stop("max_sample should be greater than or equal to min_sample.")
   # Create the dataframe to hold the results.
   df_ci <- data.frame()

   for (population_size in pop_size){

      # Determine the sample size range.
      sample_int_min <- sample_int(min_sample, population_size)
      sample_int_max <- sample_int(max_sample, population_size)
      # Yes, it can happen that sample_int_min > sample_int_max, despite the parameter checks, above.
      if (sample_int_min <= sample_int_max){
         for (sample_size in seq(sample_int_min, sample_int_max)){
            cat(paste0("\nCalculating ", n_simulations, " ", sample_size, "-size samples for population size ", population_size, "."))
            # Calculate the pop_total_div_estimated_total vector.
            pop_total_div_estimated_total <- c(NA, n_simulations)
            for (i_sim in 1:n_simulations){
               population <- r_dist_fn(population_size)
               sample_from_pop <- sample(population, sample_size)
               pop_total_div_estimated_total[i_sim] <- sum(population) / (population_size * mean(sample_from_pop))
            # Loop across the required confidence levels.
            for (conf in confidence){
               # Calculate the confidence interval.
               alpha <- (100 - conf) / 100
               ci <- quantile(pop_total_div_estimated_total, probs=c(alpha / 2, 1 - alpha / 2))
               # Add a row to the dataframe.
               df_ci_row <- data.frame(population_size                        = population_size, 
                                       sample_size                            = sample_size, 
                                       confidence                             = conf, 
                                       pop_total_div_estimated_total_ci_lower = ci[1], 
                                       pop_total_div_estimated_total_ci_upper = ci[2])
               df_ci <- rbind(df_ci, df_ci_row) 
         } # Ends sample_size for loop.

      } # Ends if.

   } # Ends population_size for loop.

Posted on May 26, 2016 by Danielle Mosimann

Drawing a Grid of Plots in R — Regression Lines, Loess Curves and More

We provide here an R function that draws a grid of plots, revealing relationships between the variables in a dataset and a given target variable.

Scatterplots in the grid include regression lines, loess curves and the adjusted R-squared statistic.

Boxplots have points indicating the group means. Box widths are proportional to the square-root of the number of observations in the relevant group. The p-value is shown for an F-test: p < 0.05 indicates a significant difference between the means of the groups. But don't take this p-value on faith: Be sure to check the assumptions of the one-way ANOVA model.

Mosaic plots include the p-value of a chi-square test of independence: p < 0.05 indicates that there is a significant relationship between the two variables under consideration. The number of plot cells with a count under five is shown; if this is greater than zero, the chi-square test may be invalid. Here's an example using a continuous target variable:

mtcars2 <- mtcars
mtcars2$cyl  <- as.factor(mtcars2$cyl)
mtcars2$vs   <- as.factor(mtcars2$vs)
mtcars2$am   <- as.factor(mtcars2$am)
mtcars2$gear <- as.factor(mtcars2$gear)
mtcars2$carb <- as.factor(mtcars2$carb)
multiplot(mtcars2, 'disp', c(2, 5))

Drawing a Grid of Plots in R

This example has a categorical target variable:

multiplot(mtcars2, 'gear', c(2, 5))

Drawing a Grid of Plots in R

Finally, here’s the multiplot function:

multiplot <- function(df_data, y_column, mfrow=NULL){
   # Plots the data in column y_column of df_data against every other column in df_data, a dataframe.
   # By default the plots are drawn next to each other (i.e. in a row). Use mfrow to overide this. E.g. mfrow=c(2, 3). 
   # Set the layout
   if (is.null(mfrow)) mfrow <- c(1, ncol(df_data) - 1)
   op <- par(mfrow=mfrow, mar=c(5.1, 4.1, 1.1, 1.1), mgp = c(2.2, 1, 0))

   for (icol in which(names(df_data) != y_column)){
      x_column <- names(df_data)[icol]
      y_x_formula <- as.formula(paste(y_column, "~", x_column))
      x_y_formula <- as.formula(paste(x_column, "~", y_column))
      x <- df_data[[x_column]]
      y <- df_data[[y_column]]
      subtitle <- ""
      if (is.factor(x)){
         if (is.factor(y)){
            # Mosaic plot.
            tbl <- table(x, y)
            chi_square_test_p <- chisq.test(tbl)$p.value
            problem_cell_count <- sum(tbl < 5)
            subtitle <- paste("Chi-Sq. Test P:", round(chi_square_test_p, 3)," (< 5 in ", problem_cell_count, " cells.)")
            plot(y_x_formula, data=df_data)
         } else {
            # Vertical boxplot.
            fit <- aov(y_x_formula, data=df_data)
            f_test_p <- summary(fit)[[1]][["Pr(>F)"]][[1]]
            subtitle <- paste("F-Test P:", round(f_test_p, 3))
            boxplot(y_x_formula, data=df_data, horizontal=FALSE, varwidth=TRUE)
            means <- tapply(y, x, function(z){mean(z, na.rm=TRUE)})
            points(x=means, col="red", pch=18)
      } else {
         if (is.factor(y)){
            # Horizontal boxplot.
            fit <- aov(x_y_formula, data=df_data)
            f_test_p <- summary(fit)[[1]][["Pr(>F)"]][[1]]
            subtitle <- paste("F-Test P:", round(f_test_p, 3))
            boxplot(x_y_formula, data=df_data, horizontal=TRUE, varwidth=TRUE)
            means <- tapply(x, y, function(z){mean(z, na.rm=TRUE)})
            points(x=means, y=1:length(levels(y)), col="red", pch=18)
         } else {
            # Scatterplot with straight-line regression and lowess line.
            adj_r_squared <- summary(lm(y_x_formula, df_data))$adj.r.squared
            subtitle <- paste("Adj. R Squared:", round(adj_r_squared, 3))
            plot(y_x_formula, data=df_data, pch=19, col=rgb(0, 0, 0, 0.2))
            abline(lm(y_x_formula, data=df_data), col="red", lwd=2)
            lines(lowess(x=x, y=y), col="blue", lwd=2) 
      title(sub=subtitle, xlab=x_column, ylab=y_column)
Posted on March 21, 2016 by Danielle Mosimann

The Changing Face of Vendor Analytics

Our most recent vendor project was an interesting change in direction compared to several vendor related projects we have previously worked on. We were asked to build out a vendor reporting capability that went beyond simple spend analytics and also brought in data from online sources such as Twitter, Google, Bloomberg, Reuters and Facebook.

This project brought forward interesting trends not just in the area of vendor analytics but also in how datasets that underpin traditional reporting areas such as sales and budgeting are likely to expand in scope to include more and more data from online sources.

Some companies are constantly meeting vendors and they need to make sure that they are asking the right questions and signing off the correct deals in these meetings. For this they need their staff to understand more than just the historical spending with the vendor. They need to know how that company is represented in the news, what key events the vendor has been involved in, such as mergers or financial results, and what people are saying about them.

Historically BI solutions have focused on summarizing and visualizing the internal data side of the business – sales, spending, CRM… Users would then supplement this with their own knowledge and research of customers, competitors and suppliers to build-up an understanding of their environment. Recently however, our aim was to improve how users gather information from research. In order to achieve this, a BI solution needs to capture a wide variety of data sources which are then analysed, aggregated and presented back to the user in an easily understood way. Then, by automatically combining this with spend data you can also allow users to better understand the relationships between datasets.

New role of BI Solutions

In order to pull all this together, there are 4 key areas that need to be built out:

    • Text mining – you need to find a way of summarizing the large amounts of unstructured content that are brought in from online data sources – after reviewing several options we went with AlchemyAPI.
    • Data mashing – a more traditional database layer is needed to combine summary results from the unstructured data with internal vendor spend data – for this we stuck with SQL Server.
    • Reporting layer – To deliver the solution we used Tableau to create a series of reports that allowed users to interact with the combined data.


Our final architecture looked something like this:

Data flow architecture


This project has led to several useful findings:

    • Overall the area of vendor analytics is enhanced by blending the spend data with online data sources. Events such as a vendor being acquired by another company, a successful project collaboration or a sales event need to be visible by the output from a tool.
    • The ability of AlchemyAPI to mine insights from text content is critical. This includes sentiment analysis but also tackles entity extraction – the process of relating people, places, companies and events to articles.
    • With AlchemyAPI you don’t have to store the content of every article (which is also why we chose it as the best tool for text analytics). You can simply send AlchemyAPI the URL to the relevant article and they analyse the content – other solutions require you to capture the full content or an article and send it to their applications.
    • ElasticSearch delivers what is needed from a NoSQL database with its flexibility to store and analyse large scale unstructured data from multiple sources. Its ability to allow multiple processes to collate and analyze data, simultaneously in real time, gives it significant advantages over other data storage solutions.
    • ElasticSearch delivers what is needed from a NoSQL database with its flexibility to store and analyse large scale unstructured data from multiple sources. Its ability to allow multiple processes to collate and analyze data, simultaneously in real time, gives it significant advantages over other data storage solutions.
    • Having built several solutions in Tableau we are aware of its traditional strengths. However, for this kind of project it is the ability to store web links in a dashboard which users can then access that is particularly useful. So if a spike in negative sentiment occurs for a supplier, a user can quickly navigate from a trend chart in Tableau to a summary of the articles content, again stored in Tableau, to ultimately to the most useful articles online.


In conclusion, we found that the area of vendor analytics can be enhanced by combining traditional spend data with online content. The process of combining unstructured online data with spend and sales data is likely to become the norm in future BI developments as companies seek to fill in the gaps that internal data cannot answer on their own.


Author: Angus Urquhart

Posted on March 1, 2016 by Danielle Mosimann

Is there ever a good time to do a price increase?

The short answer: NEVER and ALWAYS!

The longer answer:

A price increase is always difficult to achieve successfully and yet doing nothing is a gradual recipe for financial disaster. Why? Your costs are never static, so within 5 years your profit margin could easily be ZERO.

Cost Inflation Impact on Profit without Price Increase

Of course good cost and supplier management can counteract this trend and is a very common and effective strategy BUT, ultimately you cannot “cut” your way out of a profit gap without damaging the long term viability of the business.


Price Increase Cartoon


Price increases are difficult because no one believes it’s easy to implement, neither are they acceptable to the customer nor seen as competitively achievable.

It is certainly not easy to analyse and think through the various implications of a price increase; OK 10 customers and 10 products might be easy enough (10×10 = 100 customer price combinations), but since most businesses are dealing with 100,000’s price combinations the implementation complexity is usually very significant.

Also the taboo of communicating a price rise usually raises strong emotions and outright fear by sales reps. “What will the client say”, “I’m going to be crucified by procurement”, “I’m going to lose the contract (and my bonus)”…

The internal resistance, especially by the front line, is therefore understandable and real, often leading to internal political lobbying to neutralize or partially counteract a price initiative. All this can become quite heated and damaging if left unresolved.

Of course both camps of the argument, to increase prices or not, are both partially right. The competitor angle especially tends to be the key argument against any price increase – “we are going to lose share”, “the competitive alternative is better value at that price”. And, while this is obviously very true, it is also a common and dangerous excuse. This rational argument about a competitive threat can become a slogan around which all the other reasons and resisting stakeholders attach their defensive flag to the “no price increase” mast.

As always the issue is to find the right balance and truly understand where the price increase opportunities lie. Of course the high volume product items (SKUs) are candidates that need extreme caution since they are typically the headline product from which most clients and competitors can compare and undercut. But, in that complex mire of the product detail are typically interesting opportunities and pricing nuggets.

Looking at the pricing increase challenge from the point of view of mining the data complexity usually offers up some interesting and tactically defensible opportunities. These will vary in rationale, for example: legacy product, servicing opportunities, supply chain performance, product or customer tail, volume commitments etc. Thinking through a range of business rules and criteria and positioning these correctly will offer a number of on-going reasons for increasing prices that are both defensible and sustainable.

Taking such a granular and more targeted approach to price increases will offer a multitude of small incremental options. The approach will typically leverage a multi-segmentation approach and can typically lead quite easily to 2-3 margin increase to add to that bottom line. By managing a tailored, adaptive and structured approach to pricing that uses complexity to its advantage you can avoid many of the pitfalls of reckless price increase initiatives.

In short, the best approach to pricing is finding opportunities in the difficulties and constraints that others prefer to avoid. A business’s capability to navigate complexity will also avoid the trap of a top down reactive edict that compels the business to increase prices or drive effort towards premium segments without a sound approach and rationale.

So yes there is NEVER a good time to do a price increase but equally you ALWAYS need to consider doing so despite the difficulties and reasons not to!

Author: Patrick Mosimann

Posted on January 12, 2016 by Danielle Mosimann

An Elasticsearch Journey

In order to achieve our best analysis we had to move away from traditional SQL to unstructured data and the team explored different platforms which would enable us to do this. Elasticsearch stood out initially due to it being structure agnostic and its ability to store ALL types of data and so we embarked on a journey of learning with this application. By investing our time and skills into Elasticsearch our company was investing in our long-term abilities, as success in the ever-changing analytics landscape requires growth alongside your tools.

Over the last 3-5 years big data technologies have become an increasingly important factor in analytics. A few years ago our company knew that as an advanced analytics provider it was integral for our skills to include working with big data. In order to achieve our best analysis we had to move away from traditional SQL to unstructured data and the team explored different platforms which would enable us to do this. Elasticsearch stood out initially due to it being structure agnostic and its ability to store ALL types of data and so we embarked on a journey of learning with this application. By investing our time and skills into Elasticsearch our company was investing in our long-term abilities, as success in the ever-changing analytics landscape requires growth alongside your tools.

Over the years of using Elasticsearch both our company and Elasticsearch have vastly improved their capabilities and so this piece will cover the key projects in our journey and how Elasticsearch facilitated them. Additional upcoming blog posts will individually explore these big data projects and the multitude of benefits and features of Elasticsearch that enabled them in more detail.

ES is used to represent the term Elasticsearch.

Elasticsearch as a Data Store (2012)

Until a big data proof of concept for a travel industry client we had mostly used SQL. However, SQL has significant cost implications when scaling up and we needed a highly concurrent application to quickly write to a data store without costing an arm and a leg.

CouchDB could get the data in but it wasn’t easy to extract the data in a meaningful way. Although exporting the data from ES (version ~0.10) was a challenge, as a data store it was beneficial in a number of ways:

  • Easy to write data
  • Brilliant search functionality
  • Open source

Elasticsearch as a Reporting Store (2013)

In an analytics and audit project for another online travel agency we analysed web server logs to work out their conversions and success rate from their enormous amount of online data. We needed to, not only write data, but to import, analyse and report 250 million records of unstructured data without, once again, costing an arm and a leg.

From using ES (~0.19) as a reporting store, the team discovered that ES:

  • Allows for a high rate of ingestion
  • Keeps data compact and therefore storage costs low
  • Only enables analysis of data by means of “facets” – an aggregated level of data based on a search query

With limited means of analysis within ES we moved the relevant data to RedShift in order to analyse and segment it. Although this was a high cost option and made up to 10%-15% of our total project cost. However, when it came to reporting we returned to our beloved platform using both ES and their dashboard, Kibana, to provide high standard reporting for the project.

Elasticsearch as an Analytical Store (2014)

We needed to find a better way to analyse the necessary data within ES and so the team developed two different Python libraries for internal analysis using ES, Python and Pandas. One, Pylastic, used ES as a wrapper for Python and the other, Pandastic, used Pylastic as a wrapper for Pandas. We gained:

  • A unified data layer
  • Simplified querying. SQL instead of JSON for our data scientists.
  • Bespoke terms to better fit our company, taking away Elasticsearch jargon.
  • Easy data extraction.

Pylastic Example
On the left, sample raw Elasticseach JSON query. On the right, sample code in Pylastic to write the same query as on the left.

All our data writing, storing, reporting and analysis for big data projects was now possible to achieve using ES which enabled us to deliver our results to clients faster as we didn’t have to move data around as much as in the past.

Elasticsearch and GeoSpatial Analysis

Another big data project was for a client who wanted a database for their salesforce in order to provide knowledge of where to distribute their products in Nigeria. The project involved mapping all outlets in this country. Interviewers in the field had hand devices which recorded surveys, geo locations and images and their area coverage was targeted. The team had to take this vast amount of recorded data and store and analyse it in order to put the database together. Once again ES was chosen over SQL due to:

  • Presence of multiple data sources, including survey data, log data and images. ES can store ALL types of data.
  • A requirement for a large number of fields/columns. (>1000)
  • Geospatial features which supported the necessary geospatial queries. For example, we were able to specify geopoint & polygon data types in ES.
  • Evolving data. ES had no problem with the changing survey data, such as adding new fields, throughout the project.
  • ES could support “live reporting” and fast real time querying.
  • We expected the reporting data to be BIG and ES was well placed to manage this volume. Currently there are over 300 million records and we have only ~1/5 of the country covered.

Interviewer Summary Report
Interviewer Summary Report: shows the path and the recorded stores for each interviewer by day.

Elasticsearch Benchmarking

As ES had become integral to our data and analysis work we were constantly looking into ways in which to improve performance and thus more rapidly deliver valuable insights to our clients. The team engaged in an ES benchmarking exercise with Bigstep’s Full Metal Cloud infrastructure and ran ES queries on 10 million documents (approx 4GB of compressed data). We knew the metal cloud would perform better compared to traditional cloud based dedicated servers but the metal cloud performance results were incredible. Results were consistently 100-200% better than existing infrastructure. As the queries became more complex, such as the geo distance calculations for our geospatial analysis, the positive performance difference was highlighted even further.
Main factors for Bigstep’s superior performance:

  • Wire-speed network
  • Hand-picked components
  • All-SSD storage based on enterprise drives

An Elasticsearch Future

Reviewing our journey with ES demonstrates that ES has been able to deliver on a breadth of projects and empowered us to vastly improve our big data and unstructured data analysis knowledge and skills. It has become a crucial tool in our team and will continue to be so in the future as both our capabilities progress to be even more advanced. Although we’re not really the type of company to make bold proclamations we would probably call ourselves Elasticsearch champions, as our team can’t seem to recommend it enough.

Author: Danielle Mosimann

Posted on June 6, 2015 by Danielle Mosimann

Big Data Journey: A few battle scars later

We interviewed Amit and Adam from our Advanced Analytics Innovation Lab – a couple of our data science leaders that have been involved in our Big Data Journey.  They discuss battle scars and what they have learned from the 4vs, data capture, storage and security to Hadoop, Redshift and Elasticsearch.

A Big Data Interview

Q: What do you think has been the biggest challenge? Has it been the integration and unification of some of that data? And the different variety of data we’re dealing with? Is it being able to handle it fast? Is it being able to store it?

I think the first challenge was actually getting our head around where the best place to start was. There’s quite a lot of buzz words and quite a lot of different ideas which are all related to big data. Each project is different. It took us a long time to get to understand all the different components that make up that ecosystem and decide which bits are used first. So when you go into a certain type of project should you start with Hadoop or should you start with Elasticsearch? They do different things. It’s getting yourself up that learning curve of what each of those niches are and where you use them for different things. That was a real learning curve for us. After two years we now feel that we’re there and we’ve done a lot with different systems and feel a lot more comfortable to choose the right tools for the job.

Would you choose Hadoop? Would you choose Redshift? Or if you’re doing visualisation maybe you choose to base it on Elasticsearch because of the quickness? The speed of return of the queries and things like that.

“Ecosystem” is a great description for working with big data because it’s not a linear process. It’s an ecosystem you have to grow. And there isn’t one tool that solves everything. As we’ve been evaluating different tools they’ve been on a journey too. Like Elasticsearch, they’ve come a long way from when we first started using them. They’ve learnt from our use cases in which they saw our speed issue, our storage issues and in terms of some of the adhoc querying as well. But then in terms of ecosystem, we have a lot of SQL analysis tools here as well. People who know SQL. So we have played quite a bit with Redshift as well.

There are different stages in projects. We often start with getting things going really quickly and trying to understand things. And there are certain tools that are a lot quicker to get going with. Tools that are used are based on the client need and the problem we need to solve.

Q: So all of this is constantly changing – we’re constantly evaluating. But how has it been with clients? In my experience it’s been trying to get clients to understand that their data policies need to change. Changing their perception of storage and retention so they are able to defensively delete certain aspects that they don’t need. But also being able to capture things that they might not have captured in the past so as to give a rich story, what they want to read from their analysis. How has your experience been from a technical perspective?

Even beyond a technical perspective it’s about understanding what data you need to capture and how long you need to store it for. We found that actually a lot of companies need to go through the process of trying to understand what they need to get the best value from their data. What they need to keep, how they best store it and the security they need to put in place. Deciding the teams as well, to work with it. With the technical aspects, things like security are not as hard as deciding policies in the beginning and getting that structure in place. We found that going through that process of understanding what you need can be the most difficult thing.

The questions come. Why do we have to do certain things? Why do you want that much data? How are you actually going to transfer that data? So all 4 Vs come into play, not just for us to understand but for the client to understand.

Tackling how you use cloud computing if you have sensitive data. Do you mask the data? What actually is or what isn’t sensitive? The IT infrastructure of a company will have their policies but how do they evaluate if something is secure or not? It’s been an issue we’ve had to work through a number of times.

How you move these massive sums of data between different cloud solutions or even within regions within cloud solutions can often be a problem if you have terabytes and terabytes of data. It can be pretty difficult when keeping security in mind and ensuring that process is secure. We came across some products and tools which can move absolute bucketloads of data in seconds and with security in place which we wouldn’t have come across if we weren’t in this space. That sort of thing’s only possible once you get in there and go on that journey.

Q: How has the leap from traditional SQL to dealing with big data been?

We’ve been through a bit of a transition here. There has been a lot of work around different technologies which aren’t directly related to the big data space which other people have been working on. JavaScript visualisation for example which definitely compliments our work. There’s a lot of different work going on in various areas that all come together in the same sort of field. We’ve progressed as a company in massive leaps and bounds.

Q: Are statistics capabilities easier with the big data ecosystem?

We have a number of stats guys and they’re getting into that role of doing things not just in small scale but on large scale and using the whole set available rather than sampling. Definitely advances in that area as well.

In terms of Big Data, there has been those who say that everyone’s talking about it, nobody really knows how to do it. I like to think we’ve graduated.

In the last 2 years we’ve gone from talking conceptually about big data and all these tools to ticking a lot of different boxes in terms of use cases. The fact that we have different teams working on different projects means we get a large variety of use cases. In that sense we almost know what we’re talking about. The reason I say almost is because it’s an ongoing journey. There are certain things we haven’t touched upon yet and we can always do more!

Posted on March 18, 2015 by Danielle Mosimann

Goldilocks and the D3 Bears

As open-source software goes from strength-to-strength, one area particularly close to my heart is that of charting libraries and APIs.  Like many areas of software, the open-source community has revolutionised the way that data is presented on the web.

Arguably the most significant open-source visualisation project is called D3 (data-driven documents), however many an inexperienced JavaScript developer’s hopes have been dashed against the rocks of D3. It is NOT a charting API.  It’s a library which helps a developer map data to elements on a web page (think JQuery for data if you are technically minded).  That means you CAN create charts, in fact I’d say you can create any chart you could ever imagine and you can do it much more easily than with raw JavaScript, however the spectrum of JavaScript programmers is broad and I would argue that D3 still requires a fairly high level of skill if you want to do something from scratch.  In the hands of an expert, the results are magnificent but sadly the majority of D3 implementations I have seen appear to have been built by taking an example and hacking at it until it fits the required data.

To address this, a number of open-source projects have emerged with the specific goal of drawing charts in D3.  Their restricted reach leads to a greater simplicity and opens the door to many more users.  However when I came to look for an API which meets the needs of our analysts – many of whom come from an Excel rather than JavaScript background – we couldn’t find that crucial Goldilocks zone between complexity and limitation.  This was what spurred us to create our own. The result is dimple, a JavaScript library which allows you to build charts using a handful of commands.  The commands can be combined in myriad ways to create all sorts of charts and the results can be manipulated with D3 if you need to do something really unusual.  The main limitation is that it only supports charts with axes for now (pie charts are in the works), but it works in a way which ought to be easily understood by anybody with some basic programming knowledge.

Dimple Price Range Chart

The example above is from the advanced section of the site, but still has less than 20 lines of JavaScript.  To get started with a simpler example, why not copy and paste the code below into notepad, save it as MyChart.html, open it in your favourite browser and then sit back admiring your first bar chart.

   var svg = dimple.newSvg("body", 800, 600);
   var data = [
     { "Word":"Hello", "Awesomeness":2000 },
     { "Word":"World", "Awesomeness":3000 }
   var chart = new dimple.chart(svg, data);
   chart.addCategoryAxis("x", "Word");
   chart.addMeasureAxis("y", "Awesomeness");

The brevity is good but it’s the flexibility and readability which we were really shooting for.  So try switching the letters “x” and “y” on the add axis lines and you get a horizontal bar chart, change “bar” in the “addSeries” line to “bubble”, “line” or “area” and you’ll get those respective chart types.  Or better still copy the “addSeries” line and change the plot to get a multiple series chart.  You can go on to add multiple axes, different axis types, storyboards (for animation), legends and more.  For ideas see the examples, or if you are feeling brave the advanced examples which I try to update regularly.

Author: John Kiernander

Posted on January 31, 2015 by Danielle Mosimann

A Single Version of the Truth – Is it Just a Myth?

Why do we hear companies talking about a “single version of the truth”? It is because of the frustration they have experienced when multiple people argue about which numbers are correct rather than focusing on what the metrics mean. Finding out what the metrics really mean would allow them to improve operational performance and business results. They want data consistency so they can understand trends, variances, causes and effects. They want to be able to have easy and quick access to information that they can trust. They do not want to wait for days to get hold of data they need but may not even be able to rely on from IT or an overworked analyst.

In many companies existing data warehouses and reporting systems are so fragmented and widely dispersed that it’s impossible to determine the most accurate version of information across an enterprise. A proper information strategy with a solid MDM and infrastructure often takes years to develop and requires substantial upfront investment. In the meantime, companies’ departments are left to develop their own short-term solutions resulting in too many data sources reporting different information. This information is incomplete, lacks structure and is sometimes even misleading.

Single version of the truth
A single version of the truth – is it just a myth?

Imagine a mid-sized and fast growing international business with a strong portfolio of brands. They know very well how to manufacture a good quality product and effectively pitch it to a consumer. However, they struggle with large amounts of data sitting in multiple Excel spreadsheets and legacy systems without having access to analytics and consistent reporting. Central management does not often have much of an in depth view of what is happening in local markets. It takes a long time and some frustration to get a simple market-share data point. Not to mention the time and frustration to gain insight into competitor and product performance on a regular basis.

Would it not be great to have a central single source of reliable & consistent information enabling quick and easy access and reporting, reducing manual work and delivering performance results quicker? It is possible. You don’t have to ‘boil the ocean’ and to try to incorporate all existing data at once. Start with market or sales data to get a consistent and accurate view of the critical KPIs, to improve segmentation and to get an insight into key areas before adding-on more…

With new technology, methodology for data unification and the emergence of data visualisation tools that are revolutionising decision making, that panacea of the “single-version” isn’t just a dream.  Whether it’s your legacy vendors or open-source options, this new wave of technology enables delivery of the right information and analysis to the right person, at the right time and on a regular basis. It can help to overcome the need for large infrastructure investment while developing your metrics; stakeholder strategy and reporting requirements.

It’s not an impossible dream and although the wave of options, methodologies and technologies might feel like an overwhelming wave, you can ride the swell towards an optimal solution.

Author: Nadya Chernusheva

Posted on November 24, 2014 by Danielle Mosimann