돌공공돌

22-04-12 (데이터과학) 본문

2022-1

22-04-12 (데이터과학)

오로시 2022. 4. 12. 12:59

hw4 피드백

 

내가 궁금한것 10번 

order --> index 반환

sort --> 값을 반환

 

 교수님 어려워요~

 

질문이 나왔다.

 

교수님의 실수

 

 

-------- R 함수 정리--------

[2-1]

? : help command

install.packages("package name") : installing pakages 

library("package name") : to load pakages

Arithmetic operators : + , - , * , / , %%

Relational operators : <, > ,<= , >= , == , !=

Logical operators : &, | , !

c() : fundamental function for creating "vector"

as.numeric()

as.integer()

as.double()

as.character()

as.factor()

is.numeric()

is.integer()

is.double()

is.character()

is.factor

is.na

typeof()

factor()

str()

levels() : factor 에 사용 가능 변수 추가 시 c() 이용

summary() : quick overview

0d : scalar , 1d : vector , 2d:matrix : dataframe , 3d : list

DF[row-index, column-index]

Vector Index : Minus Index means "excpet for"

names(vector) <- c("~","~~")

seq(starting value, end value, gap)

rep(value, times or each)

intersect(vectorA, vectorB)

union(vectorA, VectorB)

setdiff(vectorA, VectorB)

unique(vectorA)

sum() : 괄호안에 논리 연산자 들어갈 수 있음

mean()

 

[2-2]

matrix : elements of the same data type

matrix( X, nrows, byrows...) : if byrow = T .. 옆으로 써지고 , F이면 아래로 써진다. F가 디폴트임/ norw는 행렬의 행 개수 지정

rownames() : name of rows

colnames() : name of columns

rowSums()

colSums()

rowMeans()

colMeans()

cbind() : combine by columns

rbind() : combine by rows

matrix[rows,columns]

DataFrame --> 여러가지 데이터타입 저장 가능 (vector 와 matrix는 한가지 데이터 타입만 가능) , list of vectors

head() 

tail()

str ()

data.frame() : vectors of same length and possibly different type

data.frame(vectors)

data.frame(varname1= c('values') , varname2 = c('values') , ... )

DataFrame[rows,columns]

palnets_df[,3] == planets_df[,"diameter"] == planets_df$diameter

List --> vectors, matrixs, data frames, lists 등 다 올 수 있는 데이터 타입이다. 

list()

list(ListName = ListElement)

names(list) <- c(~~)

list component에 접근하기 : list[[n]] , list[['ListName']] , list$ListName

 

 

[03]

read.csv() --> stringasFactor 옵션이 있다.

read.table("file",sep,"col.names)

write.csv(dataframe, "filedirector/file", row.names =T/F)

write.table()

save(variabels, file="FileDirectory/FileName.RData")

load(file='filedirectory/filename.RData')

FunctionName <- function(parameter) {

  opreation

}

if (condition) {

  operation

}

if (condition) {

operation1

} else {

operation2

}

for(var in seq) {

operation

}

ifelse(test, True, False)

apply(x, margin, fun, ...) : x(matrix or data frame) //margin =1 (행별로) maring =2 (열별로)// fun (function)

runif(x) : x개의 creates random numbers

 

lapply(x, fun , ...) : it returns a "list"

sapply(x,fun, ...) : ir return a "vector" or "matrix" ,, 인수에 function이 들어감

tapply(x,grp_var,fun,...) : apply fun to x after grouping with gpr_var

 

[04]

aggregate(var1~var2 , data = X , fun =func , ...) : apply func to var1 of X(dataframe) after grouping by var2  --> so this is alternates to tapply. it returns the result as data frame.

 

aggregate(var1 ~(var2+var3), data = X , FUN = func, ...) : var2와 var3를 기준으로 var1의 func이 나온다

aggregate(cbind(var1,var2) ~ var3 , data = X , FUN - func) : var3를 기준으로 var1 , var2의 값이 나온다.

 

order(..., decreasing = T or F) : order() returns a vector of index ,, ascending is default

sort(X,decreasing =TRUE or FALSE) : returns a sorted vector of values

 

sample(X, #sample, replace = FALSE , ... ) : random sampling ,, replace = T 이면 반복을 허용한다.

nrow()

ncol()

set.seed(x)

split(df, split_var,...) : split a dataframe into a list of dataframe with split variable && in split_Var the logical operator can be used too..

subset(df , condition, select,...) : 원하는 column을 뽑을 수 있따.

subset(mtcars, mpg >25) == mtcars[mtcars$mpg >25 ,]

ex) subset(mtcars, mtcars >20 , select = c(mpg,hp)) : 따옴표 없다. 특정 column만 빼고 싶다면 -c(colname1,colname2)

merge(df1, df2, df3..., all) : all = T --> full join , all = F --> default

which(x, condition, ...) : returns the TRUE indices of x

which.max(x)

which.min(x)

cut(x,breaks, right, ...)

quantile(x, prob , ... ) 

combination of quantile() and cut()

cut_points <- quantile(mtcars$mpg, c(0,0.25,0.75,1))

mtcars$fuel_efficiency <- cut ( mtcars$mpg, breaks = cut_points, include.lowest = T)

levels(mtcars$fuel_efficiency) <- c("low25perc","normal","high25perc")

paste(x1, x2, sep, collapse ...) : concatnate several values into one string , with space

paste0(x1, x2,sep,collapse...) : without space

gsub(old , new, x) : x means vector or dataframe

plot()

table()

 

-----

[6]

key functions for data exploration

head()

class() --> dataframe 인지 vector 인지 등

dim()

names()

str()

summary()

hist()

plot()

 

tidyr::

gather(data,key,value) <-> spread(data, key, value)

separate(data , col , into , sep) <-> unite(data, col , sep)

 

tolower(x)

toupper(x)

 

 

 

 

 

 

 

 

 

 

'2022-1' 카테고리의 다른 글

22-04-13  (0) 2022.04.13
22-04-13 (통계학)  (0) 2022.04.13
22-04-12  (0) 2022.04.12
22-04-12 선형대수학 (시험까지 D-9)  (0) 2022.04.12
2022-04-11  (0) 2022.04.11
Comments