How to do Data Format in R

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Data Format in R, You’ll learn about data formats and why reformatting data can help you enhance your data analysis in this tutorial.

Data is typically acquired from a variety of sources and by a variety of persons, and it is kept in a variety of formats.

Data formatting is the process of transforming data into a standardized format that allows you to make meaningful comparisons.

Data formatting is an important aspect of dataset cleansing since it guarantees that data is consistent and easy to understand.

Let’s take an example of data set containing Cities, Bangalore, Bengaluru, Bnglr all are the different expressions be used to symbolize Bangalore City.

In the majority of cases, you’ll want to consider them all as a single unit, or format, to make statistical analysis easier later on.

Customer Segmentation K Means Cluster »

Data Format in R

As discussed in one of our old posts, the same dataset will utilize here also.

library(tidyverse)
library(dplyr)
library(ggplot2)
data<-read.csv("D:/RStudio/Airlinedata.csv",1)
head(data)

There is a column called “FlightDate” in the Airline dataset. The “FlightDate” field is formatted as “year-month-day,” with 2003 as the year, 03 as March, and 28 as the day.

The “FlightDate” field can be separated into three columns: “year,” “month,” and “day.”

Reformatting the date in tidyverse is as simple as typing one line of code. You can do the same while utilizing different packages also but here we are concentrating only on tidyverse package.

Because one of our old posts discussed the important “packages for data science” contains tidyverse.

Cluster Meaning-Cluster or area sampling in a nutshell »

This example reformats the column with the separate() function, separating the date and renaming the three new columns “year,” “month,” and “date.”

data1<-data %>% separate (FlightDate,sep="-", into=c("year","month", "day"))
head(data1)

The data type may be wrongly determined for a variety of reasons, including when importing a dataset into R or processing a variable.

For example, the allocated data type for the flight date is “character,” despite the fact that the desired data type is numeric.

str(data1)

It’s critical to investigate the column’s data type and convert it to the correct data type for further analysis; otherwise, the models you later construct may act strangely, and valid data may be interpreted as missing data.

KNN Algorithm Machine Learning » Classification & Regression »

The sapply() function in R can be used to verify the data type of each column in a dataset to determine column data types.

sapply(data1,typeof)

If this gives the wrong conversion then you can make use of mutate function.

data2<-data1 %>%
select(year, month, day) %>%
mutate_all(type.convert) %>%
mutate_if(is.character,as.numeric)
str(data2)

You learned in this tutorial that reformatting data is a method of bringing information into a common standard of expression, which allows you to make meaningful comparisons.

Principal component analysis (PCA) in R »

The post How to do Data Format in R appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Methods – finnstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)