Skip to contents

This function merges two literature matrices based on specified key columns, with options for full or inner joins and duplicate column removal.

Usage

merge_matrix(
  .data,
  .data2,
  by = NULL,
  all = FALSE,
  remove_dups = TRUE,
  suffixes = c(".x", ".y"),
  silent = FALSE
)

Arguments

.data

A data frame to be merged.

.data2

A second data frame to be merged with `.data`.

by

A character vector specifying the column(s) to merge by. Must exist in both data frames.

all

A logical value indicating whether to perform a full join (`TRUE`) or an inner join (`FALSE`, default).

remove_dups

A logical value indicating whether to remove duplicate columns before merging. Default is `TRUE`.

suffixes

A character vector of length 2 specifying suffixes to apply to overlapping column names from `.data` and `.data2`, respectively. Default is `c(".x", ".y")`.

silent

A logical value indicating whether to suppress messages about duplicate column removal. Default is `FALSE`.

Value

A merged data frame with specified join conditions applied.

Details

The function first ensures that `.data` and `.data2` are valid data frames and checks that the `by` columns exist in both. If `remove_dups = TRUE`, duplicate columns are removed before merging. The function then performs either a full or inner join using `dplyr::full_join()` or `dplyr::inner_join()`, respectively.

Examples

df1 <- data.frame(id = c(1, 2, 3), value1 = c("A", "B", "C"))
df2 <- data.frame(id = c(2, 3, 4), value2 = c("X", "Y", "Z"))

# Inner join (default)
merge_matrix(df1, df2, by = "id")
#> Removing duplicate columns...
#>   id value1 value2
#> 1  2      B      X
#> 2  3      C      Y

# Full join
merge_matrix(df1, df2, by = "id", all = TRUE)
#> Removing duplicate columns...
#>   id value1 value2
#> 1  1      A   <NA>
#> 2  2      B      X
#> 3  3      C      Y
#> 4  4   <NA>      Z

# Remove duplicate columns before merging
df3 <- data.frame(id = c(1, 2, 3), value1 = c("A", "B", "C"), extra = c(1, 2, 3))
df4 <- data.frame(id = c(2, 3, 4), value2 = c("X", "Y", "Z"), extra = c(4, 5, 6))
merge_matrix(df3, df4, by = "id", remove_dups = TRUE)
#> Removing duplicate columns...
#>   id value1 extra.x value2 extra.y
#> 1  2      B       2      X       4
#> 2  3      C       3      Y       5