Alternative Data Sources and the Future of Data Engineering

by Uday Kumar, Chief Digital Officer

February 11, 2022

Alternative Data Sources and the Future of Data Engineering

When building transaction fraud and credit underwriting decision engines and #ml models at my previous job, I came across a new concept called “Alternative Data Sources” or ADS. #alternativedata draws from non-traditional data sources so that when you apply #analytics to the data, they yield additional insights that complement the information you receive from traditional sources.

Here is more context on the business drivers fueling the ADS movement from my first-hand experience:

  • Card issuers and Retail banks for long relied on
    credit bureaus (Equifax, Experian, and TransUnion) and their own organically acquired customer data for risk assessment and decision making. 
  • However the #pandemic turned behavior on its head and they started to recognize the need and value of alternative data that may not have been on their radar a few years ago.
  • Banks started sourcing and orchestrating alternative data like income, employment, utilities payment history, and rental payments into their decisioning platforms and products.
  • Companies that know how to properly harness
    alternative data can greatly improve their decision-making, and gain a significant competitive advantage.

Organizations can unlock massive business value with the help of ADS but it I learned the hard way that there are a ton of challenges getting there. Here are some of them:

  • You need to find a trustworthy alternative data source or ADS
  • You need to validate your hypothesis that the ADS will provide meaningful lift to your business outcomes/needs
  • You need to build the #datapipelines to ingest the data into your enterprise (data transformation is another challenge)
  • You need to build adequate #observability to monitor #dataquality
  • You need to orchestrate the ADS into your models and engines
  • You need to do a boat-load of testing on resulting decisions, scores, and outcomes

The possibilities and sources of data are growing at an exponential rate — from consumer online activity (e.g., social, reviews, search) to geospatial (e.g., footfall) to industry-specific data (e.g., trade flows and shipping).

For business process owners, the BIGGEST CHALLENGE will be the speed of data ingestion. One best practice I would give leaders is to define smaller steps and projects that can be executed immediately and iterated continuously. When done right, this can be a org-wide enabler — adding value not just to operations and research functions but also to demand planning, logistics, and customer support teams.

I am reliving these lessons and best practices as my team at Akira Technologies attempts to build a POC to improve the forecasting of the next forest fire and the optimal mobilization of resources (man, machine) for a timely and effective response. 

#dataengineering #dataanalytics #datatransformation #machinelearning #productstrategy #fridayinspiration #disastermanagement #fema #nationalguard