As we have entered to this era of Data, we might be thinking that why is this Data Science field is so popular ?, why are companies and multinationals investing so much on this field of Data Science?.
The answer is simple, because they are making profit out of it. This field has become very crucial and popular in todays market.
Now let’s see the definition of Data science, according to Wikipedia Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.
Some of the common applications which we could take are:
- Amazon’s recommendation engines suggests items that you are likely to buy or suggest you with a couple of options.
- Tesla’s auto driving cars adapts to the type of road where its driving. If its driving on a rough patch, then it would reduce its speed.
- Netfilx data mines for a pattern in the movies and series you watch and it suggests you or recommends for movie or series you would love to watch.
- This list can go on.
Ideally people who work under the field of Data Science are called “Data Scientist”. But it varies from company to company.
Now you might be wondering that these things sound’s awesome, how do I become one?
Trust me ,It’s tough, but not impossible.
And why it’s tough?
The reason behind is , a data scientist have to know three main domain’s which are:
- Computer science using big data - Computer Science and IT
- Math and Statistics
- Subject matter expertise – Domain / Business Knowledge
If you know the above domain’s, embrace yourself, you are a Data Scientist or a Data Science Unicorn in my opinion.
In 2012, Harvard Business Review called field of Data Science as “The Sexiest Job of the 21st Century”.
Today conventional items are been replaced by smart items, such as smart watches, smart bands, smart shoes, etc. The sensors in these devices are capturing data of an individual.
On an average, there are two to three sensors which are capturing individual’s data and storing it probably in gigabytes (GB). By 2022, it’s going rise up to an average of 24 sensors with storage capacity in petabytes. Like I said at the start, We have entered to the era of Data.
Data Science Lifecycle
Data Science lifecycle consists in couple of stages:
1. Business Understanding
In this stage, data scientist asks for relevant questions from the business and defines the objectives from these questions.
At this stage Problem statement is defined that needs to be tackled.
2. Data Mining
Gather the data required to tackle the problem statement.
Ideally the more past data, the more precision to output is attained.
3. Data Cleaning
Fixing the inconsistent data, handling outliers and treat missing values.
4. Data Exploration
At this stage Data scientist try to find a meaning to the data he has.
An hypothesis is set at this point for the problem that has to be tackled.
5. Predictive Modelling
Create models using various tools such as R, SAAS, Python, etc and train with the data sets which data scientist have.
6. Data Visualization
Communicate the findings to the state holders using interactive visualizations.