The Pony Express began in 1860 and lasted only about 18 months. During that time, the service delivered more than 35,000 pieces of mail. Today, with the digitalization of information, the average person on earth creates 1.7 MB of information every day. If you aren’t familiar with a MB, don’t worry. We’ll get to that later. What is important to understand is that it is an immense amount of information. In fact, there is so much of it, the information is transmitted numerically (digitally) and referred to as data. The focus of this article is data analytics vs. data science.
How do data analytics relate to data science? While both terms are sometimes used interchangeably, there are distinct differences between the two disciplines. While data science and data analytics have overlapping areas, a breakdown between the two highlights areas where they differ until answers are provided to questions that weren’t asked.
Structured VS Unstructured Data
Getting right down to basics, we can look at structured and unstructured data using the example of the Pony Express rider.
The rider carried the mail (physical documents such as letters, court documents and other information) in a leather pouch. That pouch was limited in what it could contain. Similarly, according to monkeylearn.com, only about 20 percent of digital information is structured. It is organized into sets of numbers, dates, columns and numerical sequences and fits neatly into spreadsheets and graphs because it is arranged according to similarities. Structured data tells the user what is happening.
When the rider arrived at his destination, along with the mail in his pouch, he was bursting with tales of his adventures. This is not organized information, and understanding it depends upon the skills of the hearer to sort and categorize it as well as understanding the timelines. Most digital data today is unstructured. That is, it doesn’t have a “predefined model” or parameter. The videos featured on Facebook and You Tube are unstructured data. So are photographs, audiofiles, text messages and user interfaces and input. Both structured and unstructured data are immense fields of data, but since unstructured data represents 80-to-90 percent of all this digital information, the task of making sense of it is almost unimaginable. That is why the field of artificial intelligence is ballooning as well. Humans are extremely limited in the amount of data they can process in a workday. AI and machine learning make it possible to perform millions of computations in a day instead of hundreds. Plus, while structured data tells users what is happening, unstructured data, once organized, tells users why things happen.
Data science focuses on obtaining usable information from massive sets of both structured and unstructured data. One way of looking at it is getting answers for things the user doesn’t realize the user doesn’t know. It’s a form of solving a mystery in which the user doesn’t realize is a mystery — until answers are provided to questions that weren’t asked.
How are those answers found? There are a variety of factors, including elements of predictive analytics, machine learning, AI segmentation, as well as statistics and basic computer science. An article in Forbes explains the way data scientists operate is to find avenues to focus upon by placing more attention on locating the right question to ask, rather than attempting to focus on finding specific answers. They take the macro view, as opposed to the micro view.
Looking at existing data about past performance and current trends, data scientists make predictions about the future. In business, those predictions can be used to create strategies.
Machine learning is closely tied to artificial intelligence. Simply put, a computer performs the same operations until there is an “automatic” path formed. Then the machine uses that path to do other tasks.
The computer learns to ignore some paths and utilize others.
Data Science Role
Data science enters a field of unknown data much like the Starship Enterprise entered the frontiers of space. It builds, cleans and organizes fields of data in its exploration. Using critical thinking skills to identify significant information including the reliability of the data source and the success rate of algorithms, it processes raw data. How does it do this?
Although this sounds like something a cowboy might do with a steer, wrangling data means organizing data and cleaning it by removing extraneous or faulty data.
This is a process by which scientists put new data into existing models to see how well they “fit.” Scientists also gain new insights by noticing where raw data does not fall into existing parameters.
Scientists then use the cleaned data to write programs that analysts can use.
Data analytics operates within the parameters of processing analysis on known datasets. The data analyst focuses on capturing data in raw form, processing it further to distill it into a form that can be organized. Basically, the data analyst looks to answer questions that need answers — but it is important to emphasize the data analyst actually has questions that need answers from the very start of having that data processed.
Types of Analytics
Data analysts do have much in common with data scientists, and the overlap means that people often do not have to choose between career paths. Here are four types of analytics that are often used in business environments.
This method studies existing data and creates ways to describe or communicate what has already happened.
Diagnostic analysis looks deeper into the data fields to understand why certain things occurred.
Used in data science as well, predictive analysis uses existing data to unearth trends and predict future happenings. It is much like a doctor’s prognosis.
This type of data analysis tries to identify strategies that can be used to reach goals or achieve success.
Explosion of Data
Earlier, a promise was made to you, the reader, that those pesky terms like “byte” and megabyte ( MB) would be explained, and a figure was given that every day the average person creates 1.7 MB of data. This comes in the form of text messages, searches and other data use. The website geeksforgeeks.org explains that the memory a computing system possesses is measured in terms of bytes.
A byte is the basic measurement. It is a unit that contains eight digits (the component most often used to designate a letter or a number) but can also be made up of smaller units called “bits.” A kilobyte is the next largest unit to the byte and is used to express the size of small files. When referencing a computer system’s memory capacity, however, most people use the term megabyte, which is one million bytes. There are other designations for larger files and systems.
Why is this important to know? Every day, according to the website kimmandotech.com, the average person creates 1.7 kilobytes of data through computer and smartphone use. Additionally, machine-generated data makes up about 40 percent of data. Machine-generated means that the data was produced without human input. Regardless, that is a lot of data and, according to the same source, by 2019, 90 percent of it had been generated in the preceding two years.
All that means that the average person’s life is undeniably affected by the retrieval and processing of data.
Wide Focus Versus Narrow Focus
Data science can be thought of as a “wide focus” for the data. If one views raw data in terms of photography, data science looks for a panoramic overview and data analytics looks to come in for a much more narrow focus. Data analytics is actually within the parameters of data science.
Data science isn’t worried about answering specific questions. What it does is take all this massive raw data and coming at it from different angles. Doing this can reveal important insights that would not have been noticed through a more narrow focus.
A Few More Differences
It is worthwhile to look at a few more differences in the article focus data analytics vs. data science. Whereas data science works in many coding languages, it is imperative for people working in data analysis to know Python and R Language. Data scientists must have an extensive knowledge of programming, while analysts need only basic programming skills. Additionally, data scientists make extensive use of machine learning and data mining. Analysts use Hadoop, which is a “framework” for processing and storing larger datasets.
While they are different disciplines, both data science and data analysis use advanced mathematics and statistics to problem solve. They both are concerned in working with big data and, as such, with problems that would be unresolved except for the procedures and tools they use. Additionally, they are put to practice in the same fields. Business, for instance, uses the historical data to foresee and avoid problems. It also uses new, raw data to increase innovation and productivity.
Medical facilities use both disciplines to arrive at new approaches to things such as automated surgeries and the analysis of medical images to diagnose tumors and other conditions while it is early enough to treat them successfully.
Preparation for Careers in These Fields
As the disciplines overlap, so do the educational pathways to achieve them. People who want careers in one or the other will take the same rudimentary courses with the emphasis on specific skills. People who want to become data scientists will spend more time in coding courses and in programming. Those who want to work in data analysis will want to be competent in tools such as Hadoop. So interconnected are these fields that universities often provide a pathway to move from data analysis to data science. Data scientists command higher salaries.
In 2019, the Bureau of Labor Statistics predicted that the growth rate for data scientists would be triple the average for all jobs. It is growing at a faster rate than data analysis. To get jobs in either discipline, many sources recommend Taking entry-level positions and getting mentors as soon as possible.
The Pony Express lasted only eighteen months, then was replaced by the invention of the telegraph. With that new technology, the dangerous and costly business of transporting mail overland by horse and rider became obsolete. The fields of data science and data analysis are only expected to grow. One reason for this is that they keep reinventing themselves. The disciplines use so many tools that derive from the technology discovered by employing the two professions, that in turn fuel the growth of data science and data analysis.
The Future of Data
In order to see where one potential future of data science and data analysis might lead, simply look at where the current state of data science is. A Forbes article revealed that data science careers are currently at an all-time high, with an increase of over 75% in job postings for data scientists and analysts. While many of the jobs currently being held by data scientists are being automated, there are still several areas available for future data scientists to explore, ranging from industry specialists to data engineers.
Here are a few job openings posted recently on Indeed.com.
• Disney Media and Entertainment needs a data scientist who can do content analysis. The person would do data mining and analysis to explore audience trends.
• Facebook is looking for a data scientist with at least two years of experience in analytics.
• McKinsey and Company needs a data scientist who can use “geospatial, biological and climatological data” in arriving at agricultural solutions.
• Liberty Mutual Insurance needs someone to fill an Analyst II, Data Science position. The person would study global data to understand “shopper” behavior and improve and innovate shopper services.
In most cases, whether the job title was data scientist or data analyst, similar responsibilities and experience requirements were listed. The major modifier in each case was experience. Additionally, though jobs for “scientists” paid more than those for analysts, both listed high salaries and compensation packages.
Employees recruited to the Pony Express were expected to be small of physique, young and without family. Little else mattered. The Indeed job listings required applicants with knowledge and skills in programming and coding, and in data analysis. The Pony Express lasted only eighteen months, then was replaced by the invention of the telegraph. With that new technology, the dangerous and costly business of transporting mail overland by horse and rider became obsolete. The fields of data science and data analysis are only expected to grow. One reason for this is that they keep reinventing themselves. The disciplines use so many tools that derive from the technology discovered by employing the two professions, that in turn fuel the growth of data science and data analysis.
In the end, whether an individual is a data analyst or a data scientist, the relation between the two is both complex and symbiotic. The data scientist needs data analytics to provide the overview data necessary to consider new ways of looking at data and the data analyst needs the questions raised by data scientists to be answered in order to assist businesses and organizations in their operations.
- 10 Best Data Bootcamps
- 20 Great Scholarships for Data Science and Big Data
- Does Data Science Involve Coding?
- Do Most Data Science Careers Require an Advanced Degree?
- How Useful is a Data Science Degree?
- What Can I Do with a Graduate Certificate in Data Science?
- What Industries Use Data Science?
- What is Data Science?
- Why is Data Science Important?
- Will Data Science Become Automated?
- Will Data Science Become Obsolete?
- Will Data Science Continue to Exist Prominently in the Future?
- Will Data Science Replace Actuaries?
- 5 Benefits of Studying Data Science Online
- 5 Data Science Conferences
- 5 Common Courses in a Data Science Degree Program
- 5 Types of People Who Should Study Data Science
- 20 Best Data Science Certificate Programs