Data underpins most of the technologies popular today, and big data refers to large datasets that cannot be handled by traditional databases. Big data is high-velocity, high-variety and high-volume, requiring particular methods for managing or processing that differ from traditional data management technologies.
We are collecting more and more data from many different sources in all manner of formats. It enables us to make better predictions about future patterns but also presents management and analysis challenges.
Artificial intelligence is the simulation of human intelligence by machines. Though these technologies are still in their early stages, big data is being used to feed AI programs and make them more viable. Access to more data than ever before means we can develop smarter machines.
At the moment, this is resulting in Virtual Personal Assistants, Smart Homes – including technologies such as Google Home and Amazon Echo, and Google DeepMind, which beat the world Go champion.
Computer game systems can monitor user behaviour by analysing natural language in the in-game chat, and serve a penalty to deter abusive behaviour. Google Translate is approaching the accuracy of human translators by analysing sentence patterns in millions of documents.
User focus in content creation
Leveraging big data for content creation is still set to be a big trend. For example, Netflix produces programmes based on data collected from their users’ viewing preferences.
The Netflix-owned US version of House of Cards was commissioned based on data that showed a large pool of their users liked the British House of Cards, Kevin Spacey films and the film director David Fincher. This played a large role in their commissioning the David Fincher directed House of Cards series starring Kevin Spacey, and it was very popular.
The company has also awarded a $1 million prize to the team who improved their viewing matching system by 10%.
Moving to the cloud
Data processing is moving to the cloud, instead of being conducted on-site. This can be done with Amazon’s data warehouse, Google’s data analytics service or IBM’s cloud platform, for example.
Popular open source data analytics tools like Hadoop were designed to work on clusters of in-house machines but they can also be deployed in the cloud.
This means that data and analytics operations are more easily scaleable as cloud hosting is more flexible. It means that companies can trial computing their data in their cloud. This also eases pressure on companies to find the technical talent to run their in-house clusters.
However, there are still technical challenges with hosting any kind of data in the cloud, but particularly in big data, such as needing a very high speed internet connection to handle the high velocity of data.
Cybersecurity and data privacy challenges
Storing more data creates even more opportunities for hackers to exploit personal user details. It’s just as well that £1.9 billion has been announced by the government to help protect companies and their customers from cyber attacks.
Many high-profile cyberattacks, such as the LinkedIn data breach that impacted 17 million users, highlights the urgent need for the government and businesses to protect their data. Consumers are becoming more aware of the data that they allow to be collected and demanding better protection.
New European Union regulations in the form of the GDPR (General Data Protection Regulation) insist that companies improve the security around the data they are collecting, and, perhaps more importantly, gain explicit consent from users to collect their data.
Data self-service platforms
Our ability to analyse big data is going to develop further with platforms that enable business users to make sense of data. Data self-service platforms are a viable alternative for companies with no in-house data analysts or data scientists, and include platforms such as Trillium and Hunk.
These have arisen partly in response to businesses demanding real-time responses to their data, as well as a skills shortage in data scientists. Gartner predicts that the global business intelligence and analytics software market will grow to £14.7 billion in 2017.
Move to stream processing
Traditionally, companies have tended to batch-process data. Continuous analysis of data is now required as businesses recognise the need to process quickly. Speed is crucial in operations such as fraud detection, trading and system monitoring.
Stream processing handles high volumes of data in real-time as it is being collected by the server. It was first used in the finance industry when the stock exchange moved from floor-based trading to electronic trading and is now commonplace.
For example, fraud detection systems process millions of card transactions in real time, so companies can detect potential fraud through analysis of spending patterns. The electricity generators and distributors monitoring hundreds of thousands of homes and businesses in real time can use stream processing to ensure adequate availability in the right locations.
Collecting more data
It’s just as well that businesses are moving to the cloud, as the amount of data we collect is only going to keep on growing. The world is set to create 180 trillion gigabytes of data annually by 2025, according to International Data Corporation (IDC). The Internet of Things will generate vast amounts of data in many formats; including sensory data, video, audio and natural speech. Companies will need to leverage this to improve services for their customers.
The smart home market has the ability to amass even more data than we previously imagined. Our devices know what time we get up, how we operate our homes, and which groceries we buy. Our smart cars know exactly where we travel to and our fuel consumption. This data can be used to improve sustainability, for example by turning off lights when we’re not using them, or efficiency by diverting resources to where they are most needed.
Companies will also be making better use of their ‘dark data’, which is data not currently being leveraged for a business purpose, and can be paper-based. This could include analysing expense and travel reports for salespeople to determine productivity, for example.
In the enterprise, there will be an increase in what are known as data lakes, making it easier to draw out business insights. Data lakes are centralized repositories that hold a vast amount of data in its native format.
The advantages of using data lakes include it being a low-cost form of storage for large amounts of data that is easily scalable, and supporting multiple programming languages and frameworks because the data is unstructured.
They have potential to help organisations work in a more coherent manner by storing all organisational data in one place. Users can be issued different security permissions to prevent anyone gaining access to all of the data to ensure security.
Data is going open source
Big data is dominated by open source tools such as Hadoop, and database mining tool RapidMiner. It’s no surprise that the future of big data is open source.
Sharing data is going to be important in developing new technologies. For example, companies like Google and Facebook share their data to progress the field of deep learning.
Google is partnering with the medical field as one London NHS trust has already shared patient details with its AI technology DeepMind to test an alert system for kidney injuries.
Big data has longevity and is much more than just a tech industry buzzword. It must not be seen as an isolated area of tech since it underpins the development of many areas of modern technology.
Big data drives the progress behind breakthrough technologies transforming our environment, such as machine learning and the Internet of Things, or how modern security agencies tackle crime.
The option to move to the cloud enables more companies to take advantage of big data in a way that is scalable. Making use of more data, and open data-sharing, is the future.