Skip to contents

Using a column within the data frame, categorize rows in a binary of yes or no, or customize with a set of category names. Data can be categorized based on the inclusion or lack of inclusion of parts of characters, or based on exact characters. Especially useful for turning ID tags into useful categories for analysis such as morphology, bleaching, taxonomy etc.

Usage

categorize(data, column, values, name, binary = TRUE, exact = TRUE, categories)

Arguments

data

The data frame.

column

The column name which contains the data on which to categorize rows.

values

The characters or parts of characters to use to classify rows.

name

The name of the now column of categories.

binary

If binary = TRUE, the name column will be returned with "Yes" denoting that characters ,or parts of characters specified by values are present in the row, while "No" denotes that there are no characters or parts of characters specified in values present in the row. If binary = FALSE there must be categories provided which will be used to classify the presence of characters or parts of characters specified in values.

exact

If exact = TRUE only exact matches will be selected. If exact = FALSE matches will be selected if they contain the characters in the values vector and will not be limited by exact matches.

categories

The factor names denoting the presence of the characters or parts of characters specified by values. These must be specified in the same order as the corresponding element in values.

Value

A data frame with new categorization column.

Examples

Sites <- as.factor(c("One", "One", "One", "Two", "Two", "Three"))
Transect <- as.factor(c("1-Deep", "1-Shallow", "2-Shallow", "1-Shallow", "1-Deep", "1-Deep"))
Acropora.sp <- c(0.1, 0.6, 0.4, 0.9, 0.2, 0.5)
Gardineroseris.sp <- c(0.4, 0.9, 0.5, 0.23, 0.5, NA)
Psammocora.sp <- c(0.9, 0.6, 0.5, 0.8, 0.1, 0.4)
Leptastrea.sp <- c(0.5, 0.7, 0.4, 0.8, 0.2, NA)
Notes <- c(NA, NA, "saw octopus", NA, "white balance corrected", NA)
coral_cover <- data.frame(Sites, Transect, Acropora.sp, Gardineroseris.sp,
                          Psammocora.sp, Leptastrea.sp, Notes)

# Classify shallow transects in a binary column
categorize(data = coral_cover, column = "Transect", values = "Shallow",
    name = "Shallow", binary = TRUE, exact = FALSE)
#>   Sites  Transect Shallow Acropora.sp Gardineroseris.sp Psammocora.sp
#> 1   One    1-Deep      No         0.1              0.40           0.9
#> 2   One 1-Shallow     Yes         0.6              0.90           0.6
#> 3   One 2-Shallow     Yes         0.4              0.50           0.5
#> 4   Two 1-Shallow     Yes         0.9              0.23           0.8
#> 5   Two    1-Deep      No         0.2              0.50           0.1
#> 6 Three    1-Deep      No         0.5                NA           0.4
#>   Leptastrea.sp                   Notes
#> 1           0.5                    <NA>
#> 2           0.7                    <NA>
#> 3           0.4             saw octopus
#> 4           0.8                    <NA>
#> 5           0.2 white balance corrected
#> 6            NA                    <NA>

# Classify depth of transect in a new column based on transect name
categorize(data = coral_cover, column = "Transect", values = c("Shallow", "Deep"),
    name = "Depth", binary = FALSE, categories = c("S", "D"), exact = FALSE)
#>   Sites  Transect Depth Acropora.sp Gardineroseris.sp Psammocora.sp
#> 1   One    1-Deep     D         0.1              0.40           0.9
#> 2   One 1-Shallow     S         0.6              0.90           0.6
#> 3   One 2-Shallow     S         0.4              0.50           0.5
#> 4   Two 1-Shallow     S         0.9              0.23           0.8
#> 5   Two    1-Deep     D         0.2              0.50           0.1
#> 6 Three    1-Deep     D         0.5                NA           0.4
#>   Leptastrea.sp                   Notes
#> 1           0.5                    <NA>
#> 2           0.7                    <NA>
#> 3           0.4             saw octopus
#> 4           0.8                    <NA>
#> 5           0.2 white balance corrected
#> 6            NA                    <NA>