The Derelict Data Warehouse

A well-maintained and utilised data warehouse is a thing of beauty. Imagine all your data from disparate sources autonomously extracted, loaded and transformed into nice neat reporting tables. Picture, if you can, impactful analysis, company-wide data-driven decision-making, a true understanding of return on ad spend. It’s enough to make anyone weak at the knees.

But therein lies the problem. Companies (or ambitious individuals) can be too desperate to reach the promised land and rush headlong into a warehousing project without due care and attention. This results in a wasteland littered with what we playfully term ‘derelict data warehouses’.

All is not lost. Most of these failed endeavours are ‘fix-em-uppers’ that can be salvaged with the right DIY skills. But before we reach for the toolbox, let’s first take a tour of the typical types of derelict data warehouse.

The Abandoned Warehouse

We’ve all been there. A project that never quite reached its end, or maybe the brainchild of someone who has long since left the company. Whatever the reason, the abandoned data warehouse is unused and unloved. The data is out of date; it needs documentation; it lacks adoption; heck, most people in the company don’t even know it exists. Worse still, it may come at a significant cost, both monetarily and from a data compliance standpoint.

The Duck Tape Warehouse

It might be charitable to call this hodge-podge of manually collected data and spreadsheet table joins a warehouse, but in a way it is – just in a very primitive form. A warehouse aims to get data out of silos, integrated and answering business questions that cannot be answered by a single source alone. And the most straightforward way to start doing this is manually. But a Duck Tape Warehouse is fragile, needs standardisation and security and is draining the time (and threatening the sanity) of your best and brightest.

The Over-engineered Warehouse

Built with the best intentions, this warehouse is a complex maze of tables, schemas, and ETL processes. While technically impressive, its complexity makes it challenging for users to navigate, understand, or derive insights from without extensive training or hours spent poring over documentation. Warehouses can come in all shapes and sizes, and a company’s likely use cases and maturity level need to be considered before deciding on the appropriate architecture and engineering approach.

The Untamed Warehouse

Not all companies have a full complement of skills within their data team. This can result in a skew in expertise, leading to solutions being created but their latent value never fully realised. Untamed Data Warehouses arise when companies have the engineering nous to get data from multiple sources into a cloud platform, but lack the knowledge on schemas, SQL, analysis and business need to tame the data and actually make use of it.

Audit to understand the problem

The above examples illustrate just some of the problems businesses face on their journey to building a data warehouse that delivers. I could have listed another 20. You may well have identified yourself in one of the above and be asking, “OK, so now what?”.

The answer may be knocking the whole thing down and starting again, it could be some minor renovations, or somewhere between the two. Data Warehouses are inherently complex due to the number of considerations and decisions needed during their creation and ongoing maintenance.

An audit can provide a more detailed understanding of the shortcomings and opportunities. We’ve put together a checklist of things to review, which you can find below.

Data Warehouse Audit Checklist

If you’d like an external perspective, Measurelab can do a complete audit of your data warehouse and present you with an estimate for the repair work – no chin scratching or sucking through teeth, guaranteed.

  1. Stakeholder feedback: Engage with the core users of the data, collecting their feedback and experiences.
  2. Performance Analysis: Unravel any lurking performance glitches, from slow query execution times to potential bottlenecks.
  3. Storage Efficiency: Dive into data storage patterns, identifying opportunities for more streamlined management.
  4. Cost Analysis: Break down the financial implications of your setup, seeking strategies to optimise expenses without sacrificing performance.
  5. Schema Review: Explore the intricacies of schema design, suggesting enhancements to boost both performance and usability.
  6. Granularity assessment: Check the granularity alignments with core reporting needs.
  7. Data integrity: Examine the data’s robustness, ensuring its completeness and accuracy.
  8. Integrity planning: Suggest strategies to maintain impeccable data quality in the future.
  9. Current pipeline health: Investigate the journey of your data, pinpointing any inefficiencies or bottlenecks.
  10. Pipeline improvements: Recommend alternative routes or tools if they can offer smoother transitions.
  11. Security and Compliance: Check data for things like who has access and at what level, PII and personal data.
  12. Build a security plan: Suggest enhancements in line with industry standards to fortify your data defence.
  13. Disaster recovery: Assess the resilience plans in place, ensuring there’s a robust strategy to combat every unforeseen calamity.
  14. Documentation review: Scrutinise existing documentation, highlighting gaps or areas that might benefit from further illumination.
  15. Continuous improvement: Propose methods to instil a culture of ongoing learning and knowledge refinement.
Written by

Matthew is the Engineering Lead at Measurelab and loves solving complex problems with code, cloud technology and data. Outside of analytics, he enjoys playing computer games, woodworking and spending time with his young family.

Subscribe to our newsletter: