Wednesday, 5 July 2017

Can Machine Learning Save Big Data?

Business Intelligence (BI) and Big Data (BD) are supposed to help enterprises improve their overall business control by identifying trends, anomalies and insight into what is really happening, so that they can take appropriate action to correct problems and address opportunities, as well as predict what might happen with a degree of informed confidence. Yet most businesses fail to either exploit BI and BD solutions or even if the do just don't get the expected returns. 

One of the biggest problems facing anyone trying to do Business Intelligence or Big Data is quality of data. In some cases, someone has just chosen the wrong source of data (a Master Data Management issue). An example of this that I fell over many years ago was that an organisation tried to build a customer database for Sales and Marketing, using financial sales ledger data. This caused immense problems because Finance is not interested in the same sort of data as Sales and Marketing and don't care who the key contacts are or what the organisdation structure is, as long as someone pays the invoices.

However, an overwelming problem is that the data is dirty. Legacy systems usually were not designed with BI data in mind and external sources used for BD usually involve different terminology and definitions when discussing or dealing with the same thing. Alternatively, the same terminology may be used to use completely different things by different people or organisations. Furthr issues arise when users are asked to input data which is of no utility to them into a system. The act of inputting the data just complicates their day job without helping them do it. So they often provide default information. Additionaly, other forms of context may inform what was meant by an iitem of data or a vague description.

So this often reduces organisations to using a Subject Matter Expert (SME) to manually interpret data and stick everything onto a spreadsheet to conduct a particular analysis. This is costly, labourious and slow. It may even be innacuarate as SME fatigue (or conflicts in opinion between different SMEs) leads to inconsistency and errors in interpretting data. So many analyses just don't get carried out or produce information too late,  lack much detail beyond generalistic interpretations or may even be suspect in accuracy.

In some cases, if the analysis is likely to be needed frequently with new or updated data sets, it may be possible to write programs with interpretation rules to deal with many of the problems mentioned above. However this is painstaking and laborious and often requires continuous maintenance, constricting its viability for many analyses as new costs and delays are built in.

Machine Learning however may offer an answer. If you can get an SME to provide some examples and use them to train a neural network, you can quickly develop the facility to improve data quality. The machine learning tool should clean up data where there are clear rules and identify areas where there is uncerainty or conflict in interpretation. This requires a relatively small amount of effort to put right and then improves consistency. The more data and examples you feed at a machine learning system, the better it becomes, so you get ever improving interpretation of dirty data. This is quick too. 

I found this out from experience, when in a previous company I worked in, we used machine learning to interpret data readings and categorisation of test results for thousands of readings taken of ship hull  plate thickness around different parts of the vessels being inspected, so that they could be analysed by structural analysis experts. A manual process which used to take a highly skilled engineer several days to complete (without creating any value) was reduced to less than 20 minutes and improved accuracy. This meant improved job satisfaction, higher productivity, better quality of output and a more responsive (i.e. quicker) service for the client.

So there you have it. Machine learning can bring some very quick gains with quite simple applications. You don't need complicated Deep Learning solutions to deliver value and it can vastly improve your BI and BD efforts where you have to live with dirty data.

No comments:

Post a Comment