This is a study note for data type. Some additional info are the following:
Data structure | Create | Indexing | Coerce |
---|---|---|---|
Vector | c() |
(1:5) |
as.vector() |
Matrix | matrix( , nrow = , ncol = ) |
[1,1] |
as.matrix() |
Array | array( , dim=c()) |
[ , 2, , ] |
as.array() |
List | list( , , , ,) |
$ID , [["ID"]] or [[1]] , ["ID"] or [1] |
as.list() |
Data frame | data.frame() |
$ID , [["ID"]] , [1, 1] |
as.data.frame() |
# essential
library(tidyverse)
All objects can have arbitrary additional attributes, used to store metadata about the object. Attributes can be thought of as a named vector or list (with unique names). Attributes can be accessed individually with attr()
or all at once (as a list) with attributes()
. The only attributes not lost are the three most important:
names(x)
: Names / dimnames, a character vector giving each element a name.dim(x)
: Dimensions, used to turn vectors into matrices and arrays, described in matrices and arrays.class(x)
: Class, used to implement the S3 object system, described in S3.Some other attributes
metadata()
: set a metadata to a Raster objectfunction description | Vector | Matrix | Array |
---|---|---|---|
get names | names() |
rownames() , colnames() |
dimnames() |
get length | length() |
nrow() , ncol() |
dim() |
combine | c() |
rbind() , cbind() |
abind::abind() |
transpose | - | t() |
aperm() |
check if type | is.null(dim(x)) |
is.matrix() |
is.array() |
function description | List | Matrix |
---|---|---|
get names | names() |
rownames() , colnames() |
get length | length() |
nrow() , ncol() |
combine | c() |
rbind() , cbind() |
transpose | - | t() |
check if type | is.null(dim(x)) |
is.matrix() |
You can name a vector in three ways:
x <- c(a = , b = , c = )
: When creating it.
x <- c(a = 1, b = 2, c = 3)
.names(x) <- c("a", "b", "c")
or names(x)[[1]]
: By modifying an existing vector in place.
x <- 1:3
; names(x) <- c("a", "b", "c")
x <- 1:3
; names(x)[[1]] <- c("a")
.setNames(x, c("a", "b", "c"))
By creating a modified copy of a vector.
x <- setNames(1:3, c("a", "b", "c"))
.Adding a dim attribute to a vector allows it to behave like a 2-dimensional matrix or a multi-dimensional array.
Class is a property assigned to an object that determines how generic functions operate with it. It is not a mutually exclusive classification. If an object has no specific class assigned to it, such as a simple numeric vector, it’s class is usually the same as its mode, by convention. Class is based on R’s object-oriented class hierarchy, shown at below:
Data Type | Example | Verify: class() |
---|---|---|
Logical | TRUE, FALSE | v <- TRUE : logical |
Numeric | 12.3, 5, 999 | v <- 23.5 : numeric |
Integer | 2L, 34L, 0L | v <- 2L : integer |
Complex | 3 + 2i | v <- 2+5i : complex |
Character | “a” , “good,”TRUE“,”23.4"" | v <- "TRUE" : character |
Raw | “ello” is stored as 48 65 6c 6c 6f | v <- charToRaw("Hello) : raw |
is.xxx
: checking if the data type is xxx.One type structure | Multiple types structure | |
---|---|---|
1-Dimension | (Atomic) Vector | List |
2-Dimension | Matrix | Data frame |
n-Dimension | Array |
class()
: an object’s object-oriented classification according to the R class hierarchy. (high-level, e.g. data.frame
)?
typeof()
: the (R internal) type or storage mode of any object (low-level, e.g. list
)?
mode()
: Even though their class (their position in the class hierarchy) is something completely different, ‘mode’ is a mutually exclusive classification of objects according to their basic structure. The ‘atomic’ modes are numeric, complex, character and logical. Recursive objects have modes such as ‘list’ or ‘function’ or a few others. An object has one and only one mode.
length()
: how long is it? What about two dimensional objects?
attributes()
: does it have any metadata?
vector type data support vector implimentation, which processes one operation on multiple pairs of operands at once.
Atomic vectors are usually created with c()
, short for combine.
Atomic vectors are always flat, even if you nest c()’s
anyNA()
: returns TRUE if the vector contains any missing values.
is.na()
: indicates the elements of the vectors that represent missing data.
a <- c(1:3) # interger vector
b <- c(FALSE,TRUE,FALSE) #logical vector
c <- c("one","two","three") # character vector
d <- seq(1:3) # interger vector
d <- seq(from = 1, to = 30, by = 10)
df <- data.frame(a=a,b=b,c=c,d=d)
str(df)
## 'data.frame': 3 obs. of 4 variables:
## $ a: int 1 2 3
## $ b: logi FALSE TRUE FALSE
## $ c: Factor w/ 3 levels "one","three",..: 1 3 2
## $ d: num 1 11 21
sapply(df, class)
## a b c d
## "integer" "logical" "factor" "numeric"
sapply(df, typeof)
## a b c d
## "integer" "logical" "integer" "double"
sapply(df, mode)
## a b c d
## "numeric" "logical" "numeric" "numeric"
Adding a dim attribute to an atomic vector allows it to behave like a multi-dimensional array. A special case of the array is the matrix, which has two dimensions. Matrices are used commonly as part of the mathematical machinery of statistics. Arrays are much rarer, but worth being aware of.
Matrices and arrays are created with matrix() and array(), or by using the assignment form of dim():
# create
## Two scalar arguments specify row and column sizes
a <- matrix(1:6, nrow = 2, ncol = 3)
## One vector argument to describe all dimensions
b <- array(1:12, c(2, 3, 2))
# You can also modify an object in place by setting dim()
c <- 1:6
dim(c) <- c(3, 2)
Lists are a step up in complexity from atomic vectors: each element can be any type, not just vectors. You construct lists by using list()
instead of c()
:
List is built on top of vector, whereas data frames and tibbles is built on top of list. therefore, compare to list, Vector is lower-level, data frames and tibbles is higher-level. The following is an exmaple of the data object architecture:
mod<-glm(mpg~cyl+disp+hp+drat+wt, data=mtcars)
mode(mod)
## [1] "list"
a <- list(1:10) # interger vector
b <- list(1:3, "a", c(TRUE, FALSE, TRUE), c(2.3, 5.9))
str(b)
## List of 4
## $ : int [1:3] 1 2 3
## $ : chr "a"
## $ : logi [1:3] TRUE FALSE TRUE
## $ : num [1:2] 2.3 5.9
A data frame is a named list of vectors with attributes for (column) names
, row.names
, and its class, data.frame
.
rownames()
and colnames()
. The names()
of a data frame are the column names.nrow()
rows and ncol()
columns. The length()
of a data frame gives the number of columns.df1 <- data.frame(x = 1:3, y = letters[1:3])
print(df1)
## x y
## 1 1 a
## 2 2 b
## 3 3 c
One of the most important vector attributes is class, which underlies the S3 object system
. Having a class attribute turns an object into an S3 object, which means it will behave differently from a regular vector when passed to a generic function. Every S3 object is built on top of a base type, and often stores additional information in other attributes. In this section, we’ll discuss four important S3 vectors used in base R:
The following is the schema of the S3 object system.
A factor is a vector that can contain only predefined values, and is used to store categorical data
stringsAsFactors = FALSE
to suppress this behaviour,month.name
month.abb
state.name
state.abb
# creating
a <- factor(c("a", "b", "b", "a"))
levels(a)
## [1] "a" "b"
(check the note for \(lubridate\) package)
(check the note for \(lubridate\) package)
(check the note for \(lubridate\) package)