Tuesday, June 11, 2019

Implications of Big Data for Society

Big data can help shape society by finding out the opinion of the population to see which policies are preferred by the majority. This can influence government decisions and change the way the world works.
 Medicine and healthcare could possibly be improved with the use of big data because looking at past medicine results and how well things worked can help predict what will work best for future patients. This may increase the populations age as people will likely live longer.
 The rate of crime may be lowered because of big data analytics we would have a better understanding of what makes crime more likely to happen therefore allowing us to avoid crime. This will result in a safer society to live in with potentially lower death rate.

Even though big data may be overall good for society, these improvements could negatively affect some peoples lives as individuals. For example if crime rates were lower then a lot of police jobs may be cut, leaving lots of people without a job and this could impact the economy.
People may not like the fact that they are being shown tailored advertisements due to big data as they could feel like they are being spied on by companies and manipulated for their money.
Lots of people want to stay private online and with big data advances this is now very difficult to be private. There may be a lack of privacy in society because of big data.

Personal Implications on the Use of Big Data

Advantages/ Disadvantages

The main advantage of big data for you personally is that you will see more information that is related to you and your interests based on previous browsing history and searches. You may enjoy seeing content which is more relevant for you rather than unrelated content which was not based on your data.

A disadvantage of big data being used for you is that your personal data is stored and may be shared to third parties which you did not know had your information and you may not consent to having the data shared to these people. The information can be used without your consent and companies can use knowledge about you to manipulate you eg. by showing you tailored ads for a specific agenda they have.

Wednesday, June 5, 2019

Strategies for Limiting Negative Effects of Big Data

Negative effect - Your personal information being shared

How this can be limited - You can limit the amount of personal information you post online to protect your data by making it unavailable to anyone. You can be aware of what information is stored about you this could be done by reading the terms and conditions and privacy statement of companies and websites. Companies should be transparent with the users in what they store and how they use the personal information.


Implementing rules of use of big data - there could be stricter laws made in how big data should be handled by organisations and companies. It should be made very clear what is being stored about people and how it is being used. People should be able to opt in or opt out of personal data storage by companies. The most important way to limit the negative effects of big data is for companies to be transparent about how they use data.

Limitations of Predictive Analytics

why don't we use big data all the time?

Incorrect data:

Most of big data is unstructured and unstructured data is difficult to make sense of. It is very likely the system will not be complex enough to understand it fully. It may be interpreted wrongly resulting in nonsensical output or it may contain incorrectly spelled words or grammatically incorrect sentences which we cannot use easily.

Incorrect/missing out data:

Another problem is that one person out of the data set may have recorded information incorrectly and so it is not a true data set therefore making analysis of the incorrect data meaningless for real life.  Data produced from people can be biased, this leads to inaccurate predictions. People may have been paid to give out certain information which can be incorrect e.g. sponsored content is often ingenuine and this can reduce the effectiveness of the predictions as we can't tell if the opinions were true or not.
***************************************************************************
Correct data:


Another thing that affects big data analytics is that an unknown third variable may come into why the data is the way it is. If we have a graph with two variables we are assuming that only those two variables will be affecting the result when in real life there are often multiple variables affecting an outcome. Predictive analytics cannot magically decide what the third variable is without knowing. There is almost always unexpected things we would not be able to predict. A major limitation of predictive analytics is that data cannot predict human feelings - human feelings often lead to the outcome. Human emotions can also lead to data being taken out of context which affects the reliability of the data and therefore can lead to inaccurate predictions.



Types of Visualisation of Big Data

A map is a good way to see visually where something is. It makes it very easy to be able to see how one place relates to another place and the distance between places.

A graph could be a line bar, scatter graph etc. It is good to use graphs to see trends and relationships. It is good to visualise numerical data.

Charts can be used an example of a chart is a pie chart. This is good to be able to visualise and quickly see things like percentages to see how something is divided.

A Table is a nice way to show structured information each row and column has a meaning and everything placed in the table has been placed in the correct categories. It looks more organised than scattered data.

Future Applications of Big Data - Machine Learning

What is machine learning - machine learning is when a computer can learn the same way humans do and this creates Artificial Intelligence (AI). They learn from past experience by saving data and looking at past data to help decide what to do in its current situation. If a computer has enough data it can look at how humans do and respond to things and then do it in the same way.

how is big data used with machine learning - intelligent computers would be able to make more accurate predictions from looking at big data and identify patterns

Applications of Big Data Use for Society

Changing public opinions for elections -
Big data can be used by the government to collect data about the opinions of the public. This knowledge can be used to then try to change the opinion of public by sending targeted messages to specific groups of people to manipulate what they think about certain topics. For example in 2016 US elections people who supported Clinton were shown targeted ads through means such as social media posts to try to sway their opinion to support trump instead. This type of knowledge about the opinions of public makes the government very powerful.


getting public opinions for policies
The government can collect opinions the public has about government policy. This allows the government to see what the public want, if they like the current policy and what they want to see changed. This can greatly influence government policy as they are put under pressure to change policies. For example the murder of a 6 year old by a 17 year old caused public outrage and then caused a change in policy by a judge.

Future Applications of Big Data - Crime Prevention

Big data can be useful in stopping crimes as it can be used to predict who is more likely to commit crime and in which situation and circumstances a crime is likely to occur. This can therefore prevent crime as police could be more informed about if a crime is likely to occur and they can take precautions to prevent it from happening.


It is not completely reliable to use big data to stop crime as there are disadvantages - we can't solely use big data because we would just be making guesses as to which crimes are going to happen and we cannot be sure. Big data is also unlikely to remove unpredictability of crimes as many criminals come as a surprise. It may also put innocent people under suspicion wrongly if they appear to meet the criteria of an average criminal even though they are law abiding.

Applications of Big Data Use for Business

People opinions (on current/upcoming product)
We can quickly gather opinions of products from a large group of people. This aids businesses in decision making for future as it can tweak the product to suit the opinions gathered from the public. This can lead to more sales as the product can be suited to the opinions of people, making them more likely to buy the product. Finding out people's opinions can help businesses understand what is popular in general.

Targeted Advertising - targeted advertisements are shown to people which are more likely to be relevant to the person. Allows person to slightly change what the data says so the person seeing it will like it. It is designed to try to make sure money for ads are being spent wisely. Shows you the products your most likely to be interested in so there's more chance you will engage with the ad and ultimately spend money. Business can purchase data so they know what you are more interested in so they can show specialised ads for you 



Applications of Big Data Use for Science

Live/Historic Data For Research - live data would be data produced within the past week and historic data can go way back hundreds of years. A lot of data is available for scientists to analyse. This can help science move forward as we find out the most efficient way to do things and we can make discoveries. An example would be pharmaceutical companies can use big data to alter their medicine. They could find out if a medicine worked from looking at facebook posts from users who talk about it.

Finding and collating research papers (and their findings) - some studies are quite small and some are not public. If you want to find out as much information as you can from studies, the best way to find out is word of mouth. Looking at all the small studies spoken about through social media can give you a bigger idea of the wider population.

Data Mining Methods

Data is extracted in order to find some sort of pattern in the data set. There are different levels of analysis we ca

Association rule mining: this is when the program tries to see if there is a relation between two or more variables from data. It helps find  the probability of relations between variables in large sets of data by making associations from the relations found.

Correlation analysis: this is used to see how strong the relationship between two variables is. It can either be a positive or negative correlation. Positive correlation is when x goes up y goes up but negative correlation is when x goes up and y goes down (on a graph).

regression analysis: this allows you to study the relationship between two or more variables. In this type of analysis one or more variables is dependent on one variable. Points are plotted on a graph and a regressive line is drawn (or line of best fit) and this can help predict the future as we see which direction the line is heading.




Monday, June 3, 2019

Characteristics of Big Data Analysis 2

The big data analysis system should be powerful enough to be able to work with the data iteratively. This means the processes can keep looping until the desired result is created. The system must also be able to handle a lot of attributes because big data comes in large sizes with hundreds or even thousands of attribute types in a data source. Whereas before it may have been many records with the same attributes. Big data is often too complicated to make sense of right away in its raw form so the analysis system must be programmatic meaning it can process the data through programs. It is advised that the system should be able to connect to other computer systems via the cloud. This is because if several computers can cooperate to process the same big data set then it will reduce the processing time.


Wednesday, May 29, 2019

Types of Problem suited to Big Data Analysis

predictive analytics


Predictive analytics is when we use big data to look for patterns which can help us predict what will happen in the future based on the pattern found. For example companies and investors can use big data analytics to predict what prices will be in the future from looking at past information of what sales have been in past years. This can help them in business decision making as they can make an estimate of what will likely happen.


diagnostic analytics


Diagnostic analytics is similar to predictive analytics in that it predicts a value, however it does not necessarily predict future values. It can have any 2 variables which can be compared. Using the existing information on the graph, the location of an unmapped value on the graph can be predicted by estimating where it would most likely be. For example the number of sales of hand held fans in a shop could be based on the temperature of that day. Using the information from previous sales and temperatures we can estimate how many fans are likely to be sold based on the temperature of the day. If the temperature is 5 degrees you may expect 1 sale. If the temperature is 30 degrees you might expect 400 sales.


descriptive analytics


Descriptive analytics is when we take data and map it onto a graph. Then see if there is any relationship between two variables. If a relationship is found we say what we think the relationship is. For example if we look at the previous example we can say that the warmer the weather is on a particular day, the more fans will be sold in the shop.


prescriptive analytics


Prescriptive analytics is when we look at the relationship and make decisions based on the relationship and what we can predict. It can be used to create the best outcome of something because we can make predictions based on the relationship. Businesses can use prescriptive analytics in order to maximise profits. Healthcare professionals can use it to give people the most effective treatment. Looking back to the shop example above we can make the decision that the business should buy less hand held fans for their stock during the cold winter and make sure there is a lot of fans stocked in the summer months to maximise profits.







































Business example

An example of a company using Big Data for business is Netfix. They have over 100,000,000 subscribers so they have a large pool of data to analyse.

Netflix uses data from each user to show each user their own recommended shows. They do this by looking at the user's previous watch history and their search history to help predict what the user may be interested in, therefore increasing the likelihood that the user will watch the recommended content. This will increase Netflix's profit as it is more likely that more people will be watching for longer because they have been recommended shows they will like.

Characteristics of Big Data Analysis

A system dealing with big data must be able to support all sorts of data including structured and unstructured, this involves text and video formats also. This is because most big data is not in a traditional data form like a table, it is mostly in an unstructured format. The system also needs to be able to process the data in real time. This is because a lot of data is flowing in constantly and so the system needs to be able to process it before it gets backlogged. The system must be able to store very large amounts of data. Big data can be up to petabytes in size so analytics companies must have storage systems in place such as warehouses or cloud storage.

Tuesday, May 21, 2019

Value of Big Data

Big data is so valuable as it contributes to business globally. Businesses use big data to aid in their decision making and ultimately shapes the future of their business. It gives them insight into their customers and so big data can be used to make predictions. If businesses know insightful information they can tailor their products/services to suit customers or they can show targeted adverts to specific groups of people to increase profits. Data can also be sold to other companies who seek information about customers (even though illegal sometimes). Companies which hold a large amount of big data on their website users are therefore more valuable companies, for example Facebook's net worth is £105 billion pounds despite only having £5 billion pound yearly profit. This shows that big data is valuable even though it is intangible. Due to the increase of accessibility to internet it is likely that more big data will be produced globally and so this will increase the value of big data as it will become even more informative. The big data market value is predicted to increase to $118.52 billion by 2022 - a 5x increase from 2015.

History of Big Data 1

Before computers were created before the 1960's the only method of collecting and recording information from the population was by paper. People would need to manually hand out forms for people to fill out or someone would go round surveying people. It was hard to collect large amounts of information because it takes a lot of people and a lot of time to collate the data as it all had to be done manually. All the data collected was structured and planned ahead eg the questionnaires were written by someone who wanted to find specific information.

During the 1960's the first computers were created. They were mostly inaccessible to the majority of the population as they were very expensive and very large. They were not practical to have as they were hard to use for the average person also. They were only adopted by large companies who could afford them at this time but the computers did not have much use as the technology was not very advanced. Data was still mostly processed by paper manually.

In 1975 the first home PCs were made. Still not many people had them as they were expensive. There was still no internet but information could now more easily be stored electronically in files stored on the computer's local storage. Information was still analysed manually through paper as computers were not mainstream technology yet.

1983 was when the internet was created. Information was beginning to be analysed on the internet however not many people had a connection to internet as it was not mainstream yet. More and more information was being stored in computers though as we moved on from using paper to process data.

From 1995-2000 the internet was growing rapidly with the birth of social media and e-commerce websites. Lots of people were now using the internet and so data was being produced from a lot of people. However there was not a lot of data and it was not very meaningful data. There was no technology yet to be able to make sense of the unstructured data. Data was almost only stored on databases and not paper anymore.

2010-present is when the use of big data and internet has really exploded. Now we are giving information unconsciously just by using the internet especially through social media and online shopping. Everything we do on the internet can be analysed with technology like Hadoop. Most data collected from people is unstructured data and 3.2 billion people on the planet use the internet meaning there is so much information which can be analysed. Almost nothing is stored or processed using paper - computer, mobile and internet technology has almost completely eliminated the need to use paper.

Hadoop

Hadoop is a program that allows you to break up tasks to deal with big data.
Hadoop is a free open source application that allows different computers dealing with the same data to communicate with each other.
How it works:
A large amount of data can be broken into smaller chunks of data (called 'clusters'. A cluster is a group of similar data taken from a larger group of abstract data). The data is split up depending on keys and sorted into their own cluster. As there is a lot of data, several computers may be required to process all of the data. Each computer working on each cluster processes the data it has and outputs the results of each cluster bringing the outputs together. The end results are then compared. Hadoop allows the processing of big data to be much quicker because several computers can work together on the same data.

Limitations of Traditional Data Analysis

Why using databases (RDMS) couldn't handle big data....

Before today's technology, databases were used to store data but this method could no longer continue for big data because the size of big data is too large for normal databases to handle. Big data is very large - it can be up to petabytes in size. Traditional databases were not designed to hold this vast amount of information. They also cannot filter out the data from the unstructured data as most of big data is unstructured. It would only make sense to hold data in a database if it was structured and it is less common to find structured data ready for a database. Big data usually needs to be processed in real-time because there is so much of it. Databases are too slow to keep up with this speed of information flow.

Types of Statistics

Descriptive Statistics - This is statistics which are usually true statistics taken from sample data and they can be analysed by using calculations. For example for a set of temperatures for a month you may want to find the average temperature or the range of temperature.The results can be displayed using graphs to show the data visually.

Inferential Statistics - Inferential statistics is based on a small sample about the population. Conclusions about the population can be drawn from the sample data. It says what the data probably means however it may be an estimate which can be based on other information also. It makes predictions of the whole sample based on some parts of the sample. For example in a manufacturing factory if you measure 3 nails out of a box of 100 nails and all 3 measure 5 cm, you can assume that all 100 nails in the box also measure 5 cm.





Types of Data - structured / unstructured



Structured Data - this is data which has been taken from a format to receive specific information. Examples of structured data would be information taken from an application form or a table which someone has made. The information which is wanted has been defined beforehand e.g. someone who wrote the application form would be looking for specific pieces of information from people.

An advantage of structured data is that it is fast and easy to take information from as it is laid out in a structured manner. There is no unnecessary information in the way, making it easier for the information to be analysed by a computer algorithm. A disadvantage of unstructured data is that it can be limited in what data is there.

Unstructured Data - this is data which has no format and so it can have a wide variety of data. Examples would be information taken from a video or a Facebook Post. The data may be ambiguous so it might be difficult to organise the data.

An advantage of unstructured data is that it can be more flexible as it does not have a structure to adhere to. There can be a lot more data within unstructured data than structured data. A disadvantage of unstructured data is that there can be a lot of unnecessary information making it difficult for the information to be analysed.



Today there is a lot more unstructured data due to the increase of social media where a lot of big data is derived from. Before internet was made most of the data was structured because it was easier and quicker to analyse structured data.

What is Big Data? '3 Vs'

Big Data can be describes as having the '3 Vs'.

Volume: (amount) this refers to the amount of information which is collected by people. There is a very large number of people data is being analysed from. For example Facebook has over 2 billion users it collects and processes vast amounts of data from. By analysing and sorting all of this data Facebook has information about patterns and trends in human behaviour.

Velocity: (real time , batch) The data passed is in real time meaning that as soon as a person does something online it is instantly processed. The data has to be processed very quickly due to the volume of data constantly being generated.

Variation: (structured, non structured) Structured data is made up of clearly defined data types meaning the data can be put into categories and can be easily searched as it in an organised format. Non structured data on the other hand is data which is in a format which makes it hard to organise the information, for example videos or Facebook posts are hard to find what data can be taken from it.

Growth of Big Data

Over time big data has grown in size and there are many reasons for this. More data is being recorded because more and more people are producing data for example from their mobile phones. Technology is exponentially improving and more and more people are using up to date technology which can produce a wide range of big data. The increased use of social media such as Facebook has greatly contributed to the amount of big data there is. There now exists smart software which can make sense of unstructured data and organise it to make sense to be used.

An example of how fast big data is growing is  Facebook now generates 500+TB of data every day when 30 years ago 200MB was considered a lot of data. An idea of how much it will increase in the future is that 90% of all data was generated in the past 2 years.


Wednesday, March 27, 2019

What is Big Data? 1

Big Data is a lot of data collected and stored from individual people through many ways such as social media activity and buying activity online. The masses of data is processed and analysed by computational methods. It is so much data that it pushes the boundaries of current technology to process because of its size. The information can be used in order to target certain people with information such as advertising. An example of big data is Facebook likes. Facebook stores all of your activity and can paint a picture of who you are based on your likes in order to target specific adverts to you. Big data can also be used to manipulate the outcome of political campaigns by showing targeted people certain adverts to try to sway their opinion.

Sizes (Memory) involved in Big Data

The amount of data used in Big Data is extremely large - it can be 10TB+ up to 100TB+ or even up to Petabytes.

It is so much data that it pushes the boundaries of current technology to process because of its size.