Upload files in R

Upload files from Excel, STATA, SAS, SPSS and text

First set the working directory (or check it)

getwd() # get working directory
 [1] "/Users/me/My Folder/"
 setwd("./My Subfolder/") # set working directory

1. .csv and .txt files

the read.csv function has many options, some of them are header=T which sets the first line as column names, sep=“,” the field separator character (in this case the semicolon), dec=“.” decimal sep character, skip=2 number of lines to skip (in this case 2).

read.csv2 is identical to read.csv except it assumes commas to be the decimal operators and semicolon as field separator

read.table works similarly to read.csv, but reads text files.

 mydata <- read.csv("mydata.csv", header=T)

When importing data in R, if any column’s name is a number, R will add an X to it (as in general it is a very bad idea to have numbers for column names, but can be handy). You can replace column names with:

 colnames(mydata) <- c("name1", "name2", "name3", "2017", "2018", "2019")

If you change or add anything to your data and want to save it then ( write.table for txt output):

 write.csv(mydata, "mydata.csv", row.names=FALSE)

2. STATA files .dta

 write.dta(mydata, "mydata.dta")

3. SPSS files .sav

use.value.labels by default is TRUE and converts value labels into factors. The mydata.txt is the name for data output, while the mydata.sps is the code output.

 mydata <- read.spss("mydata", to.data.frame=T, use.value.labels = FALSE)
 write.foreign(as.data.frame(mydata), "mydata.txt", "mydata.sps", package="SPSS")

4. SAS files .sas

Note that by default it converts value labels into factors

## to read from SAS
 mydata <- sasxport.get("mydata.xpt")

## to save in SAS format
 write.foreign(as.data.frame(mydata), "mydata.txt", "mydata.sas", package="SAS")

5. Excel spreadsheet

# library(xlsx)
 mydata <- read.xlsx("c:/myexcel.xlsx", 1) # 1 refers to the first worksheet-page altrenatively...
 mydata <- read.xlsx("c:/myexcel.xlsx", sheetName="Data input")
 write.xlsx(mydata, "mydata.xlsx")
# library(readxl)
mydata <-system.file("mypath/myexcel.xlsx", package = "readxl")
mydata <- read_excel(mydata, 1)

(A few) quick tricks

# head(mydata, n=10) # first 10 rows
 tail(mydata, n=10) # last 10 rows
 mydata[1,1:10] # print first row and first 10 columns
 names(mydata) # variable names
 nrow(mydata) # number of rows
 ncol(mydata) # number of columns
 str(mydata) # list structure of data
 class(mydata) # class of data
 view(mydata) # opens viewer window

Author: acarioli

is a PostDoc at the Geography and Environment department of the University of Southampton, WorldPop project team. She is also affiliated researcher at CED, UAB and Dondena Centre. Her interests include spatial econometrics and modeling, bayesian methods, machine learning processes, forecasting, micro-data simulation, and data visualization. Demo-traveler, Mac enthusiast, R zealot and Rladies member.