R統計 | 讀取資料

除了直接輸入之外，R也可以直接匯入csv、txt資料。

大部分的人習慣用EXCEL或記事本key-in資料，再將檔案匯入統計軟體進行分析。R則可以透過read.csv()以及read.table()兩種指令，完成匯入csv與txt的工作。

這裡所提到的範例，都可以從範例檔案下載。

匯入檔案

csv檔案

首先嘗試匯入customer.csv。


     > data<-read.csv("c:/Users/USER/downloads/customer.csv", header=T, sep=",") #將csv匯入給data

     > dim(data) #顯示data的個案與變數數量

     [1] 100  18 #data共有100筆個案、18個變數

如果原始的csv在key-in時，沒有變數名稱，只要將read.csv()的header參數改為F，代表這個csv沒有表頭，就可以匯入資料

customer_without_varname.csv與customer.csv的內容一模一樣，唯一的差異是customer_without_varname.csv已經將變數名稱刪除，現在重新將customer_without_varname.csv匯入，並注意將參數設為head=F，檔案名稱命名為data_novar：


     > data_novar<-read.csv("c:/Users/USER/downloads/customer_without_varname.csv", header=F, sep=",") #將csv匯入給data_novar

     > dim(data_novar) #顯示data的個案與變數數量

     [1] 100  18 #data_novar共有100筆個案、18個變數

透過head()指令，可以看出有無變數名稱(表頭)的差異：


     > head(data) #顯示data前6筆資料

       region gender age edcat jobcat employ income jobsat marital pets_cats pets_dogs pets_birds pets_small pets_saltfish pets_freshfish homeown cardspent card2spent
     1      1      1  20     3      1      0     31      1       0         0         0          0          0             0              0       0     81.66      67.80
     2      5      0  22     4      2      0     15      1       0         0         0          0          0             0              6       1     42.60      34.94
     3      3      1  67     2      2     16     35      4       1         2         1          0          0             0              0       1    184.22     175.75
     4      4      0  23     3      2      0     20      2       1         0         0          0          0             0              0       1    340.99      18.42
     5      2      0  26     3      2      1     23      1       1         0         0          0          0             0              0       0    255.10     252.73
     6      4      0  64     4      3     22    107      2       0         1         1          0          2             0              7       1    228.27       0.00


     > head(data_novar) #顯示data_novar前6筆資料

           V1     V2  V3    V4     V5     V6     V7     V8      V9       V10       V11        V12        V13           V14            V15     V16       V17        V18
     1      1      1  20     3      1      0     31      1       0         0         0          0          0             0              0       0     81.66      67.80
     2      5      0  22     4      2      0     15      1       0         0         0          0          0             0              6       1     42.60      34.94
     3      3      1  67     2      2     16     35      4       1         2         1          0          0             0              0       1    184.22     175.75
     4      4      0  23     3      2      0     20      2       1         0         0          0          0             0              0       1    340.99      18.42
     5      2      0  26     3      2      1     23      1       1         0         0          0          0             0              0       0    255.10     252.73
     6      4      0  64     4      3     22    107      2       0         1         1          0          2             0              7       1    228.27       0.00

其實，更快速的方法是利用ctrl+c直接在EXCEL試算表「複製」資料，然後用read.table("clipboard")貼上R：


     > data<-read.table("clipboard")

txt檔案

同樣的方法，也適用於read.table()。以匯入customer_txt為例：


     > data_txt<-read.table("c:/Users/USER/downloads/customer_txt.txt", header=T, sep=";") #將csv匯入給data_txt

     > dim(data_txt) ##顯示data_txt的個案與變數數量

     [1] 10  7 #data_txt共有10筆個案、7個變數

在沒有變數名稱的情況下，一樣可以透過head=F參數匯入customer_txt_without_varname.txt：


     > data_txt_novar<-read.table("c:/Users/USER/downloads/customer_txt_without_varname.txt", header=F, sep=";") #將txt匯入給data_txt_novar

     > dim(data_txt_novar) #顯示data_txt_novar的個案與變數數量

     [1] 10  7 #data_txt_novar共有10筆個案、7個變數

比較head=T以及head=F的差異：


     > head(data_txt)

       region gender age edcat jobcat employ income
     1      1      1  20     3      1      0     31
     2      5      0  22     4      2      0     15
     3      3      1  67     2      2     16     35
     4      4      0  23     3      2      0     20
     5      2      0  26     3      2      1     23
     6      4      0  64     4      3     22    107


     > head(data_txt_novar)

       V1 V2 V3 V4 V5 V6  V7
     1  1  1 20  3  1  0  31
     2  5  0 22  4  2  0  15
     3  3  1 67  2  2 16  35
     4  4  0 23  3  2  0  20
     5  2  0 26  3  2  1  23
     6  4  0 64  4  3 22 107

SPSS檔案

R的基礎程式沒有辦法直接匯入SPSS的sav檔，必須先下載foreign延伸套件。

library()是下載延伸套件的指令，載入foreign後，可以執行read.spss()指令：


     > library(foreign) #下載foreign延伸套件

     > spss<-read.spss("c:/Users/USER/downloads/customer_sav.sav", to.data.frame=TRUE) #匯入customer_sav

     > dim(spss) ##顯示customer_sav的個案與變數數量

     [1] 100  18

除了foreign外，也可以透過haven延伸套件中的read_sav()來匯入sav。


     > library("haven")

     > spss<-read_sav("c:/Users/USER/downloads/customer_sav.sav")

SAS檔案

載入foreign或haven套件之後，SAS檔案格式可用read.xport()或read_xpt()指令來匯入：


     > library("foreign")

     > sas<-read.xport("c:/Users/USER/downloads/customer_xpt.xpt")


     > library("haven")

     > sas<-read_xpt("c:/Users/USER/downloads/customer_xpt.xpt")

STATA檔案

STATA檔案可用read.dta()或read_dta()指令來匯入：


     > library("foreign")

     > stata<-read.dta("c:/Users/USER/downloads/customer_dta.dta")


     > library("haven")

     > stata<-read_dta("c:/Users/USER/downloads/customer_dta.dta")

匯入dBase檔案

部分資料蒐集完成後會直接儲存成資料庫格式，read.dbf()可以匯入dbf的檔案格式。


     > library("foreign")

     > dBase<-read.dbf("c:/Users/USER/downloads/customer_dbf.dbf")

命名變數

對於還沒有命名的變數，R會以V1、V2、V3...來代替，我們可以利用names()這個指令，來替變數取名字：


     > names(data_txt_novar)<-c("region", "gender", "age", "edcat", "jobcat", "employ", "income")

     > head(data_txt_novar)
       region gender age edcat jobcat employ income
     1      1      1  20     3      1      0     31
     2      5      0  22     4      2      0     15
     3      3      1  67     2      2     16     35
     4      4      0  23     3      2      0     20
     5      2      0  26     3      2      1     23
     6      4      0  64     4      3     22    107

顯示/移除資料

截至目前，我們已經匯入4筆檔案，分別是data、data_novar、data_txt、data_txt_novar。這4筆檔案可單獨呼叫、存取或分析，如果無需使用則可刪除。

ls()可以用來顯示檔案，要刪除檔案則可以透過rm()指令。


     > ls() #顯示作用中的檔案

     [1] "data"           "data_novar"     "data_txt"       "data_txt_novar"


     > ls(data_txt) #顯示data_txt的變數名稱

     [1] "age"    "edcat"  "employ" "gender" "income" "jobcat" "region"

不要的檔案可以用rm()來刪除：


    > rm(data)

    > ls()

    [1] "data_novar"     "data_txt"       "data_txt_novar"


    > rm(list = ls()) #刪除所有檔案

    > ls()

    character(0)

簡介