Predicting Store Sales on the Sales Beat


FMCG marketers sell through a chain of distributors, who in turn sell to retailers. Each distributor is allocated a territory and his task is to make sure that all retailers in his territory are always well stocked with the company's product lines. The distributor uses a team of sales man who visit the retailer on weekly/fortnightly basis to take orders, and these orders are fulfilled on a next day/next two day basis. Very often the distributor pushes the stock into the retailer store through schemes and incentives, always wanting to occupy high shelf space. The retailer may not always want to stock the products the distributor pushes. He is interested in only those product lines that the consumer wants and where his margins are healthy.

The Problem at Hand

The distributor salesman takes the orders from the retailer using a hand held device. The device displays about 8-10 items in a menu format and the salesman scrolls the device from top to bottom, picking up the SKUs from the menu and inputting the quantity ordered. If the order size is large, say 25 items or more, the time taken for order fulfillment may exceed fifteen minutes and it is safe to assume that most of this time is spent scrolling up and down. This is colossal waste of salesman's time, would have been better off with the old fashioned pen and paper method. If the scrolling time could be reduced, the time saved could be used in trying to cross sell other products that the retailer may have forgotten to ask or may be reluctant to order.


This paper offers a methodology that predicts which items are most likely to be bought by a store. Once the items most likely to be bought are known, the hand held is configured to display only those items on the screen, and pushing those with low likelihood to the bottom. This way the salesman uses less time in scrolling and more time in selling. Should the sales visit another store on his beat, another set of items are displayed, thus customizing the order at the store level.

Data Requirement

We obtained 53 weeks of data from the FMCG marketer on a monthly basis. The data is as follows:

Some additional data variables were created from the date of transaction such as:


The first stage in the process is removing the last 10 weeks of sales for the purpose of validating our predictions. Three methods were adopted to make predictions.

Stores were clustered into 9 groups based on item and quantity bought. Using these shop based similarities, predictions were made for shops. Shops in a similar cluster were predicted as buying the same items.

This process generated rules such as "shops which bought item A will buy item K".

Using Cox's Proportional Hazards model, probabilities for each item, for each store and for that sales visit was computed. These probabilities were based on past purchase patterns and indicated the likelihood of the store to purchase the item on that sales visit. The items were sorted on probabilities and the top 30 items were chosen as predictions for sale. These predictions were compare month on month for each store. Finally predictions were made into the "futuristic IO weeks" that was kept aside.


All the models predicted with an accuracy of over 85% with the clustering algorithm topping with 93% accuracy. The Cox' Regression was next with 89% and the Association Analysis model at 85%. The model also provided "hot" cross sell opportunities for the salesman to push which had a high likelihood of being sold.


The Store Predictor Model is a great asset to salesman of FMCG companies who want to increase throughput into stores. It not only saves time in order taking but also provides alerts on cross sell opportunities. It is a great example of how analytics and technology can be integrated to generate business value.