Refs:
# devtools::install_github("hadley/lineprof")
library(lineprof)
Profiling code is necessary to find bottlenecks and try to optimize the use of time and memory by removing them.
code = '
read_delim <- function(file, header = TRUE, sep = ",") {
# Determine number of fields by reading first line
first <- scan(file, what = character(1), nlines = 1,
sep = sep, quiet = TRUE)
p <- length(first)
# Load all fields as character vectors
all <- scan(file, what = as.list(rep("character", p)),
sep = sep, skip = if (header) 1 else 0, quiet = TRUE)
# Convert from strings to appropriate types (never to factors)
all[] <- lapply(all, type.convert, as.is = TRUE)
# Set column names
if (header) {
names(all) <- first
} else {
names(all) <- paste0("V", seq_along(all))
}
# Convert list into data frame
as.data.frame(all)
}
'
write(code, "source.R")
source("source.R") # this is necessary for lineprof to work
library(ggplot2)
write.csv(diamonds, "diamonds.csv", row.names = FALSE)
l <- lineprof(read_delim("diamonds.csv"))
l
## Reducing depth to 2 (from 16)
## time alloc release dups ref
## 1 0.005 0.018 0.000 2 "lazyLoadDBfetch"
## 2 0.001 0.003 0.000 0 "scan"
## 3 0.020 0.005 0.000 62 c("scan", "file")
## 4 0.004 0.006 0.000 1 "scan"
## 5 0.001 0.003 0.000 1 c("scan", "close")
## 6 0.001 0.007 0.000 0 "scan"
## 7 0.022 0.003 0.000 62 c("scan", "file")
## 8 0.001 0.003 0.000 1 c("scan", "as.list")
## 9 0.001 0.001 0.000 1 c("scan", "identical")
## 10 10.991 2.359 0.890 0 "scan"
## 11 0.007 0.002 0.000 1 c("scan", "close")
## 12 0.002 0.004 0.000 0 c("lapply", "match.fun")
## 13 0.001 0.003 0.000 0 "lapply"
## 14 2.709 0.227 0.337 15 c("lapply", "FUN")
## 15 0.001 0.001 0.000 0 "lapply"
## 16 3.323 0.344 0.000 18 c("lapply", "FUN")
## 17 0.001 0.002 0.000 1 character(0)
## 18 0.001 0.001 0.000 0 "as.data.frame"
## 19 0.008 0.022 0.000 0 c("as.data.frame", "lazyLoadDBfetch")
## 20 0.361 0.931 0.524 294 c("as.data.frame", "as.data.frame.list")
## 21 0.001 0.000 0.000 0 character(0)
## src
## 1 lazyLoadDBfetch
## 2 scan
## 3 scan/file
## 4 scan
## 5 scan/close
## 6 scan
## 7 scan/file
## 8 scan/as.list
## 9 scan/identical
## 10 scan
## 11 scan/close
## 12 lapply/match.fun
## 13 lapply
## 14 lapply/FUN
## 15 lapply
## 16 lapply/FUN
## 17
## 18 as.data.frame
## 19 as.data.frame/lazyLoadDBfetch
## 20 as.data.frame/as.data.frame.list
## 21
A good way to see the results is to use an interactive explorer using the shiny
package:
library(shiny)
# opens a web page that shows your source code annotated with information about how long each line took to run
shine(l)
The t column visualises how much time in seconds is spent on each line.
The a is the memory (in megabytes) allocated by that line of code.
The r is the memory (in megabytes) released by that line of code (this may vary, since it depends on the garbage collector).
The d is the number of vector duplications that occurred. A vector duplication occurs when R copies a vector as a result of its copy on modify semantics.
To see the values just hover the mouse over the required bar.