Merge Two literature matrices by Common Columns

This function merges two literature matrices based on specified key columns, with options for full or inner joins and duplicate column removal.

Usage

merge_matrix(
  .data,
  .data2,
  by = NULL,
  all = FALSE,
  remove_dups = TRUE,
  suffixes = c(".x", ".y"),
  silent = FALSE
)

Arguments

.data: A data frame to be merged.
.data2: A second data frame to be merged with `.data`.
by: A character vector specifying the column(s) to merge by. Must exist in both data frames.
all: A logical value indicating whether to perform a full join (`TRUE`) or an inner join (`FALSE`, default).
remove_dups: A logical value indicating whether to remove duplicate columns before merging. Default is `TRUE`.
suffixes: A character vector of length 2 specifying suffixes to apply to overlapping column names from `.data` and `.data2`, respectively. Default is `c(".x", ".y")`.
silent: A logical value indicating whether to suppress messages about duplicate column removal. Default is `FALSE`.

Value

A merged data frame with specified join conditions applied.

Details

The function first ensures that `.data` and `.data2` are valid data frames and checks that the `by` columns exist in both. If `remove_dups = TRUE`, duplicate columns are removed before merging. The function then performs either a full or inner join using `dplyr::full_join()` or `dplyr::inner_join()`, respectively.

Examples

df1 <- data.frame(id = c(1, 2, 3), value1 = c("A", "B", "C"))
df2 <- data.frame(id = c(2, 3, 4), value2 = c("X", "Y", "Z"))

# Inner join (default)
merge_matrix(df1, df2, by = "id")
#> Removing duplicate columns...
#>   id value1 value2
#> 1  2      B      X
#> 2  3      C      Y

# Full join
merge_matrix(df1, df2, by = "id", all = TRUE)
#> Removing duplicate columns...
#>   id value1 value2
#> 1  1      A   <NA>
#> 2  2      B      X
#> 3  3      C      Y
#> 4  4   <NA>      Z

# Remove duplicate columns before merging
df3 <- data.frame(id = c(1, 2, 3), value1 = c("A", "B", "C"), extra = c(1, 2, 3))
df4 <- data.frame(id = c(2, 3, 4), value2 = c("X", "Y", "Z"), extra = c(4, 5, 6))
merge_matrix(df3, df4, by = "id", remove_dups = TRUE)
#> Removing duplicate columns...
#>   id value1 extra.x value2 extra.y
#> 1  2      B       2      X       4
#> 2  3      C       3      Y       5