R: Calculate the Average Temperature in Cities by Dates

Given the data set of temperatures of different cities and their timestamps, report the average temperature of the cities per day.

More formally, you are given an input data frame with the first column “datetime” containing the timestamp of the observations in yyyy-mm-dd hh:mm:ss format, and each of the other column headers represents the name of the cities such as “Vancouver”, “New York” etc. You are given temperatures of different cities corresponding to each timestamp.

Your task is to return a pivoted data frame of the following format:

  • The first column “Date” should contain distinct dates found in the data set in yyyy-mm-dd.
  • Each of the other columns should contain cities found in the data set.
  • Corresponding to each city and date, the table should contain average temperature of the city on the particular day up to two decimal points

 

For example:

DateNew YorkVancouver
2020-09-20  
2020-09-21  

 

Function Description

Complete the function calculate_average_temperature_in_citites_by_dates in the editor below.

functionName has the following parameter:

    df_input:  a data frame from ‘csv’ file

Constraints

  • Each data frame consists of no more than 3000 rows.
  • The instructions are complete and data is valid.
  • Do not make any assumptions beyond the problem statement.

SOLUTION:

				
					calculate_average_temperature_in_cities_by_dates <- function(df_data) {
  
  # Check input data
  if (!is.data.frame(df_data)) {
    stop("Input data is not a data frame")
  }
  
  # Convert non-numeric variables
  df_data[, -1] <- lapply(df_data[, -1], function(x) {
    if (is.numeric(x)) {
      x
    } else {
      as.numeric(as.character(x))
    }
  })
  
  # Convert the 'datetime' column to POSIXlt type
  if (!"datetime" %in% colnames(df_data)) {
    stop("Input data does not have a 'datetime' column")
  }
  df_data$datetime <- as.POSIXlt(df_data$datetime, format = "%Y-%m-%d %H:%M:%S")
  
  # Extract the date from the 'datetime' column
  df_data$Date <- as.Date(df_data$datetime)
  
  # Remove the original 'datetime' column
  df_data <- subset(df_data, select = -c(datetime))
  
  # Calculate the average temperature for each city and date
  df_output <- tryCatch({
    aggregate(. ~ Date, data = df_data, FUN = mean, na.rm = TRUE)
  }, error = function(e) {
    stop("Error calculating average temperature: ", e$message)
  })
  
  # Sort the city names alphabetically
  df_output <- df_output[, c("Date", sort(colnames(df_output)[-1]))]
  
  # Round the average temperatures to two decimal points
  df_output[, -1] <- round(df_output[, -1], 2)
  
  return(df_output)
}

# DO NOT CHANGE THIS CODE

# Open connection
fptr <- file(Sys.getenv("OUTPUT_PATH"))
open(fptr, open = "w")

# Read input 'csv' file
df_input <- read.csv("/dev/stdin", stringsAsFactors = FALSE)

# Process result data set
df_output <- calculate_average_temperature_in_cities_by_dates(df_input)

# Save results as 'csv' file 
write.table(df_output, fptr, sep = ",", row.names = FALSE, col.names = TRUE)

# Close connection
close(fptr)