14  Appendix II: Garbage collection gc()

The gc() function in R is used for garbage collection, which is the process of reclaiming memory that is no longer in use by the program. This function helps manage memory efficiently, especially when working with large datasets or complex computations.

14.0.0.1 Purpose of gc()

  • Memory Management: gc() triggers R to perform garbage collection, freeing up memory that is no longer needed. This can help improve performance and prevent memory exhaustion during data-intensive operations.
  • Memory Usage Reporting: When called, gc() also returns a report on current memory usage, providing insights into how much memory is being utilized.

14.0.1 Output of gc()

When you run gc(), you receive a matrix output with the following structure:

          used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells  652520 34.9    1438668 76.9         NA   818710 43.8
Vcells 1789652 13.7    8388608 64.0      16384  1963270 15.0

14.0.1.1 Explanation of the Output Columns

  1. Ncells: Represents the number of cons cells currently in use. Used for fixed-size objects such as lists and environments.
    • used: The number of cons cells currently allocated.
    • (Mb): The equivalent memory usage in megabytes.
    • gc trigger: The threshold at which garbage collection will be triggered for cons cells.
    • limit: The maximum number of cons cells allowed (if set).
    • max used: The maximum number of cons cells used since the last reset.
  2. Vcells: Represents the number of vector cells currently in use. Used for variable-sized objects such as vectors, matrices, and data frames.
    • Similar columns as Ncells, indicating usage statistics for vector allocations.

14.0.1.2 Interpretation of Values

  • Used Memory: Indicates how much memory is currently being utilized by your R session (both Ncells and Vcells).
  • GC Trigger: Shows the threshold for triggering garbage collection; if memory usage exceeds this limit, R will automatically perform garbage collection.
  • Limit: Displays any set limits on memory usage; NA indicates no limit is enforced.
  • Max Used: Reflects the peak memory usage since the last reset or since the start of the R session.

14.0.2 Best Practices for Using gc()

  1. Call After Large Object Removal: It can be useful to call gc() after removing large objects from your workspace using rm() to ensure that R reclaims that memory immediately.
Code
rm(large_object)
gc() # Free up memory after removing large objects
  1. Use in Long Loops: If you are performing heavy computations inside loops, consider calling gc() periodically to free up memory and maintain performance.
Code
for (i in 1:10000) {
    # Heavy computations
    if (i %% 100 == 0) { # Call gc every 100 iterations
        gc()
    }
}
  1. Monitor Memory Usage: Use gc() to monitor memory usage during long-running processes or when working with large datasets to avoid running out of memory.

  2. Combine with gcinfo(): Use gcinfo(TRUE) to enable verbose output about automatic garbage collections, helping you understand when and how often garbage collection occurs.

14.0.3 Conclusion

The gc() function is an essential tool for managing memory in R, especially when dealing with large datasets or complex analyses. By understanding its output and implementing best practices for calling it, you can optimize your R environment for better performance and efficiency.

14.0.4 References

This explanation provides a comprehensive overview of the gc() function in R, including its purpose, output interpretation, and best practices for effective memory management.