Big Data 2/6

in #education8 years ago (edited)

Welcome to the second entry about the big data, I hope to subscribe, interact and give you Resteem. Thanks @ javf1016 :)

First Entry: https://steemit.com/business/@javf1016/big-data-1-10

The importance of Big Data has been growing in recent years, but many people do not know its meaning, several companies such as IBM, have been developing research and gathered information so that everyone can understand what it is and its importance. "In general terms we could refer to the trend in technology that has opened the door to a new approach to understanding and decision making, which is used to describe huge amounts of data (structured, unstructured and semi-structured ) That it would take too long and would be very costly to load them into a relational database for analysis. "(Fragoso, Ricardo Barranco.

Big Data gives us a great possibility to take advantage of large amounts of data generated by systems or users processing them in a conventional way, a considerable volume of information could be measured as follows:

Gigabyte = 109 = 1,000,000,000
Terabyte = 1012 = 1,000,000,000,000
Petabyte = 1015 = 1,000,000,000,000,000
Exabyte = 1018 = 1,000,000,000,000,000,000

The great variety of data is another very important factor, for example when using an intelligent clock this can measure different values ​​such as temperature, humidity, heart rate, which can give us important information in real time, which implies That analyzing everything said before could not make a normal application.

Nowadays huge amounts of information are generated every second, transactions are generated every second by collecting information on products, customers, suppliers, population data, and we must add that companies such as Twitter and Facebook generate quantities that are on Petabyte As every second upload millions of photos and videos in them.

The machines also play an important role in collecting data, which we call M2M, where we find sensors of all types for various sectors that are currently established, for example, in the electrical sector, which help us measure the Intensity and consumption.

Depending on the problem you are facing, we have different types of data. Fig. 1

Fig. 1 Data types in the Big Data [1]

It is not enough just to get to know what kind of data we have collected, we must find a way to analyze it from passing data to information and thus obtain adequate knowledge and that adapts to the needs for which the Big Data has been implemented. As we will observe later several techniques use different analyzes, for the moment the necessary ones are four, prescriptive, predictive, diagnostic and descriptive.

The four types of analysis are focused on different needs, the descriptive ones are directed to know How to act (How many employees are needed?), The predictive ones to What will happen (How much will be the growth of the company in the year?), The one of Diagnosis to Why it has happened (Why our projects do not reflect what we do?) And the descriptive to What to do to make it happen (How did we reach a 100% increase in sales?). The types of analysis presented have points in common that help in decision making, which based on the analyzes guarantee a high level of security that we will achieve the expected results.

The phases that make up the analysis of these data, have to follow a line to ensure good practices in this process and are the following:

• Get data

• Processing data
• Clean data

• Exploratory analysis

• Models and algorithms

• Products

The phases presented, operate as a cycle which is shown in the following image, Fig. 2, and which will be explained in an orderly way later when you see techniques, procedures that use these steps.

Fig. 2. Process in data analysis

So far, the need for Big Data is in obtaining speed, volume and variety in the data collected, but as we see in the chart are several types of data, which focus on different areas of everyday life, we find specialized data For what would be our daily interaction in social networks like Facebook and Twitter, the recognition of the footprint that have some cell phones integrated, in the registry of the footprint in different companies, all are important data but if you want a timely and effective analysis We must recognize what kind of data it is and thus know how is its handling and behavior.

At the beginning, it is not easy to handle a Big Data, its security is paramount, how will we protect them? How will we access them? How many are private? Can they be lost? Are some questions that come to us? Head when we begin to discover this ocean of information.

Focusing on computer security, it is sought that amount of information that is generated and in turn constitute a pillar of logs, logs, logs, which usually no one reviews, analyzes, are considered as a first level problem.
Cyber ​​attacks have something in common, they are designed to work under IDS / IPS alerts, safeguarding themselves within the large volumes of data generated daily in an organization, every attack leaves a mark on either the logs, the problem at We must face that either a comprehensive analysis of that information, a post-attack study, we find that the information collected exceeds the capacity of real-time analysis, in addition that the conservation of the logs in some companies is considered low level .

To be able to face this problem is necessary a tool that can perform a pre-analysis of the information with rules that depending on the subject can be provided or established previously; Organizations know that technology exists for this type of analysis, this allows decision-making and planning in response to data security incidents to be taken into account.

Traditionally, security analysis is carried out through packets and registers, but with the increased volume of data and its speed, it has proved costly and demanding.

Glossary
IBM International Business Machines Corp.
Twitter and Facebook, most popular social networks
M2M machine to machine - machine to machine
IDS Intrusion Detection System
IPS Intrusion Prevention System
Log, referring to a log.

Bibliography
[1] Sunil, Soares. Not Your Type? Big Data Matchmaker On Five Data Types You Need To Explore Today. [Online] http://www.dataversity.net/not-your-type-big-data-matchmaker-on-five-data-types-you-need-to-explore-today/