David Gee's BLOG (Darwin Boy): A guide to the little jobs that will make big data work

Thursday, October 1, 2015

A guide to the little jobs that will make big data work

[Blog post] Time to get your hands dirty.

Big data is one of the more glamorous terms in today’s IT vernacular, but in reality making it work is about small dirty jobs that take up a lot of resources for little immediate return.
As business adopts obsessive focus on ‘the customer’, there is a clear divide opening up between companies such as Amazon and Uber that run their businesses on analytics, and the silent majority simply trying to do things a little bit better.
The secret to staying on the right side of this gap is investing early into the little things, like metadata governance, standardisation and data glossaries.

Here are a few simple tips to guide the formative days of your business’s big data future.

Cleaning up the metadata

The big data holy grail for enterprises is a single customer view.
But the reality is that most organisations probably have around eight to 10 different customer databases that all exist to support different transactional systems. Each customer probably has a different ID in each of these accounts.

This is only going to become more daunting as the internet of things becomes a day-to-day reality and sensors are used to detect a customer instore, track their browsing behaviour and make them offers via their devices.

We have to remember that big data usually hasn’t been cleaned up and integrated into a single source of truth – indeed the opposite is the case.

To understand what is in the data lake, we need high quality metadata to track the various data stores and to distil some meaning from them.

Metadata Tsars

This is where good quality governance becomes critical.
Most enterprises have some form of data governance, but its focus is usually restricted to higher level priorities than metadata.

But you really can’t make any sense of the vast amounts of data unless you have a comprehensive metadata management approach.

This means taking on historical and headache-inducing problems like data types and data names that are not always consistent, like dates being stored as variable character fields.

Correcting these is a significant exercise with often little to show for it until a sizable investment has been already made. Unfortunately these are jobs that can’t be avoided.

Data lakes not data dumps

There is no point in building a data lake if this information can’t be accessed. That is a data dump.
Enterprises have traditionally struggled to implement data warehouses.

At best they have been a reasonable place for basic reporting systems. At worst, the shortfalls have resulted in a proliferation of these environments and the truth is that most enterprises now have a number of data warehouses.

The current architecture landscape would appear to be splintered into a number of separate data stacks.

We have learnt this lesson so let’s not repeat the same mistakes when it comes to big data.

Think like a librarian

The right approach is to do what librarians do and ensure you establish a data glossary to catalogue the enterprise data sources.

This does not need to be all-encompassing and you do not need to boil the ocean. Instead you can build a common data set of critical business data elements. What you will be focused on will be enriching the catalogue so sources are noted and applications that use this data are tracked.

Like a library, this enables sharing. Thus anyone in the business can now use their own BI tool of choice to access a shared and validated database.

For financial services, it is also critical to maintain data lineage, and in essence that means that regulated data is never deleted. Therefore if we find an issue and want to correct this, we need to maintain a history of these changes by appending rather than overwriting.

Elephants are afraid of mice

Doug Cutting’s daughter had a toy elephant that was named Hadoop - and that inspired the name of his influential big data product. But most of us would think of elephants as giant marauding beasts in Saharan Africa.

I’ve often heard that elephants are afraid of mice and in the data world at least this seems to hold true.
A giant Hadoop database cluster looks all powerful and strong, but without the small things like high quality metadata being implemented, the elephant is much weaker than one would expect.

When it comes to big data the small things really matter.

No comments:

Post a Comment

Favourites Places ever visited

Galapagos Island - Potala Palace - Tibet Bodhigaya
- India Taj Mahal - Agra - Maachu Pinchu
Petra - Jordan Jerusalem - Israel
Jiu Zai Gou - China Bagon - Mynamar
Angkor Wat - Cambodia Sagrada Familia - Barcelona
Ryoanji Temple - Kyoto Varanasi - India
Inlay Lake - Mynamar Timbavali Safari - South Africa
Abel Simbul - Egypt Aya Sophia - Turkey
Florence - Italy Forbidden City - China
Grand Canyon - USA Great Barrier Reef - Australia
Great Wall of China - China Karnak Temple - Egypt
Kinkakyji Temple - Japan The Pyramids - Egypt
Vatican City - Italy Datong Caves - China
Dunhuang Caves - China Venice - Italy
Hangzhou Lake - China Kushingar - India
Tian'anmen Square - Beijing Pingyao - China
Bara Imambara - Lucknow Agra Fort - New Delhi
Empire State Building USA Guilin - China
Jerash Ruins - Jordan Miyajima Shrime -Japan
Oxford University - UK Troy Ruins - Turkey
Sydney Harbour Bridge -Australia Pompeii Ruins - Italy
Sydney Opera House - Australia
Humble Administrator Gardens - Suzhou
Rome Coliseum - Italy Nile River - Egypt
Xian Entombed Warriors - China Brussels City Hall - Belgium
Plaza Major - Spain Piazza del Campidoglio - Italy
Ayrs Rock - Australia Royal Palace - London
Versaille Palace - France Royal Palace - Madrid
Golden Gate Bridge - USA Himeji Castle - Japan
Nalanda University - India Verona Coliseum - Italy
Lumbini - Nepal Sarnath - India
Royal Palace - Thailand Shao Lin Temple - China
Niagara Falls - Canada Eifel Tower - France
Statute of Liberty - USA Three Gorges - Sanxia China
Verona Coliseum - Italy Pompidio Center- Paris
Jin Mao Tower- Shanghai The Bund - Shanghai
Notre Dame - France Le Shan Buddha - China
Tower of London -UK Geneve Lake - Geneva
Efes Ruins - Turkey Qutub Minar - India
India Gate - India

David Gee's BLOG (Darwin Boy)

Thursday, October 1, 2015

A guide to the little jobs that will make big data work

[Blog post] Time to get your hands dirty.

No comments:

Blog Archive

Cool Links