R: Calculate the Average Temperature in Cities by Dates
Given the data set of temperatures of different cities and their timestamps, report the average temperature of the cities per day.
More formally, you are given an input data frame with the first column “datetime” containing the timestamp of the observations in yyyy-mm-dd hh:mm:ss format, and each of the other column headers represents the name of the cities such as “Vancouver”, “New York” etc. You are given temperatures of different cities corresponding to each timestamp.
Your task is to return a pivoted data frame of the following format:
- The first column “Date” should contain distinct dates found in the data set in yyyy-mm-dd.
- Each of the other columns should contain cities found in the data set.
- Corresponding to each city and date, the table should contain average temperature of the city on the particular day up to two decimal points
For example:
Date | New York | Vancouver |
2020-09-20 | ||
2020-09-21 |
Function Description
Complete the function calculate_average_temperature_in_citites_by_dates in the editor below.
functionName has the following parameter:
df_input: a data frame from ‘csv’ file
Constraints
- Each data frame consists of no more than 3000 rows.
- The instructions are complete and data is valid.
- Do not make any assumptions beyond the problem statement.
SOLUTION:
calculate_average_temperature_in_cities_by_dates <- function(df_data) {
# Check input data
if (!is.data.frame(df_data)) {
stop("Input data is not a data frame")
}
# Convert non-numeric variables
df_data[, -1] <- lapply(df_data[, -1], function(x) {
if (is.numeric(x)) {
x
} else {
as.numeric(as.character(x))
}
})
# Convert the 'datetime' column to POSIXlt type
if (!"datetime" %in% colnames(df_data)) {
stop("Input data does not have a 'datetime' column")
}
df_data$datetime <- as.POSIXlt(df_data$datetime, format = "%Y-%m-%d %H:%M:%S")
# Extract the date from the 'datetime' column
df_data$Date <- as.Date(df_data$datetime)
# Remove the original 'datetime' column
df_data <- subset(df_data, select = -c(datetime))
# Calculate the average temperature for each city and date
df_output <- tryCatch({
aggregate(. ~ Date, data = df_data, FUN = mean, na.rm = TRUE)
}, error = function(e) {
stop("Error calculating average temperature: ", e$message)
})
# Sort the city names alphabetically
df_output <- df_output[, c("Date", sort(colnames(df_output)[-1]))]
# Round the average temperatures to two decimal points
df_output[, -1] <- round(df_output[, -1], 2)
return(df_output)
}
# DO NOT CHANGE THIS CODE
# Open connection
fptr <- file(Sys.getenv("OUTPUT_PATH"))
open(fptr, open = "w")
# Read input 'csv' file
df_input <- read.csv("/dev/stdin", stringsAsFactors = FALSE)
# Process result data set
df_output <- calculate_average_temperature_in_cities_by_dates(df_input)
# Save results as 'csv' file
write.table(df_output, fptr, sep = ",", row.names = FALSE, col.names = TRUE)
# Close connection
close(fptr)