Nettskjemar connects to version 2 of the nettskjema api, and the main functionality here is to download data from a form into R. Once you have created a nettskjema api user, and set up your Renvironment locally, you can start accessing your forms.
While functions to download data also have the option to turn off the codebook, i.e. return the data with the original questions as column names, this is not recommended. Working with data in R in this format is very unpredictable, and we cannot guarantee that the functions in this package will act as expected.
Therefore, you are highly advised if you are using this package, to turn on the codebook in the Nettskjema portal for your form, and setting up a codebook for the entire form. You can toggle the codebook for a form by going to the Nettskjema portal and entering your form. Then proceed to “Settings” and then “General settings”, and make sure §Codebook activated” is set to “Yes”.
The data returns in this package are developed to be tidyverse-compatible. This means that those who are familiar with tidyverse, should find working with the data as retrieved from this package fairly easy. If you want to learn about the tidyverse and how to use is, there are excellent resources for that on the Tidyverse webpage.
Perhaps at the core of nettskjemar is the ability to download submission answers to a form into a tibble (variation of a data.frame).
Form 123823 has 4 responses to download.
# A tibble: 4 × 18
form_id submission_id attachment_1 attachment_2 checkbox checkbox_matrix…
<dbl> <chr> <chr> <chr> <chr> <chr>
1 123823 16785801 NA NA 1;2 1;2
2 123823 16779763 -9j5dzy.jpg amm_mowinckel_… 2 1
3 123823 16509317 Screenshot 2021-01-22… NA 1 1
4 123823 16508664 marius.jpeg NA 1 1;2
# … with 12 more variables: checkbox_matrix_2 <chr>, date <chr>, datetime <chr>,
# dropdown <chr>, freetext <chr>, number_decimal <chr>, number_integer <chr>,
# radio <chr>, radio_matrix_1 <chr>, radio_matrix_2 <chr>, slider <chr>, time <chr>
There are many arguments that can be set that give you control over
the data extraction. By default, the data as set by the codebook is
retrieved by use_codebook = TRUE
, this can be set to
FALSE
which would retrieve the full-text information. If
the codebook has not been set up, initial download will fail because
there is none, and the user must toggle this themselves in the
nettskjema-portal.
Form 123823 has 3 responses to download.
Error: `select()` doesn't handle lists.
Run `rlang::last_error()` to see where the error occurred.
This specific test form includes attachments, which we know currently cannot be accessed without a codebook.
If you are incrementally checking the data from your form, you don’t have to download the entire catalogue at the same time. While in most cases, the data requests are very fast, for forms with a large number of responses, incremental downloads might be more efficient.
Form 123823 has 2 responses to download.
# A tibble: 2 × 18
form_id submission_id attachment_1 attachment_2 checkbox checkbox_matrix_1
<dbl> <chr> <chr> <chr> <chr> <chr>
1 123823 16785801 NA NA 1;2 1;2
2 123823 16779763 -9j5dzy.jpg amm_mowinckel_300.png 2 1
# … with 12 more variables: checkbox_matrix_2 <chr>, date <chr>, datetime <chr>,
# dropdown <chr>, freetext <chr>, number_decimal <chr>, number_integer <chr>,
# radio <chr>, radio_matrix_1 <chr>, radio_matrix_2 <chr>, slider <chr>, time <chr>
Another way to incrementally get data is by submission id. In a
similar way as with from_date
, from_submission
allows you to specify from which submission ID on the data should be
retrieved.
Form 123823 has 1 responses to download.
# A tibble: 1 × 16
form_id submission_id checkbox checkbox_matrix_1 checkbox_matrix_2 date datetime
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 123823 16785801 1;2 1;2 1;2 07.10.2021 15.10.2…
# … with 9 more variables: dropdown <chr>, freetext <chr>, number_decimal <chr>,
# number_integer <chr>, radio <chr>, radio_matrix_1 <chr>, radio_matrix_2 <chr>,
# slider <chr>, time <chr>
The Nettskjema survey tool includes the possibility to create
checkboxes, i.e. giving the respondents the ability to select several
options within a question. How this returned as data is not clear cut.
The default behavior of Nettskjema portal is to create one enumerated
column per checkbox, with the context of the column cells being the
codebook value. The default behavior of this package is to return the
checkboxes as character strings with options selected separated by
semi-colon (;
).
# To start inspecting the data more
library(dplyr)
# These are the defaults, they don't need to be set
# They are just highlighted here
nettskjema_get_data(123823,
checkbox_type = "string",
checkbox_delim = ";") %>%
# this data has all checkbox questions coded with names like "checkbox"
select(form_id, submission_id, starts_with("checkbox"))
Form 123823 has 4 responses to download.
# A tibble: 4 × 5
form_id submission_id checkbox checkbox_matrix_1 checkbox_matrix_2
<dbl> <chr> <chr> <chr> <chr>
1 123823 16785801 1;2 1;2 1;2
2 123823 16779763 2 1 NA
3 123823 16509317 1 1 1
4 123823 16508664 1 1;2 1
these can be separated into rows if wanted, using tidyverse syntax.
nettskjema_get_data(123823,
checkbox_type = "string",
checkbox_delim = ";") %>%
select(form_id, submission_id, starts_with("checkbox")) %>%
separate_rows(checkbox)
Form 123823 has 4 responses to download.
# A tibble: 5 × 5
form_id submission_id checkbox checkbox_matrix_1 checkbox_matrix_2
<dbl> <chr> <chr> <chr> <chr>
1 123823 16785801 1 1;2 1;2
2 123823 16785801 2 1;2 1;2
3 123823 16779763 2 1 NA
4 123823 16509317 1 1 1
5 123823 16508664 1 1;2 1
Another way is to request the checkbox data returned as list columns
nettskjema_get_data(123823,
checkbox_type = "list") %>%
select(form_id, submission_id, starts_with("checkbox"))
Form 123823 has 4 responses to download.
# A tibble: 4 × 5
form_id submission_id checkbox checkbox_matrix_1 checkbox_matrix_2
<dbl> <chr> <list> <list> <list>
1 123823 16785801 <chr [2]> <chr [2]> <chr [2]>
2 123823 16779763 <chr [1]> <chr [1]> <chr [1]>
3 123823 16509317 <chr [1]> <chr [1]> <chr [1]>
4 123823 16508664 <chr [1]> <chr [2]> <chr [1]>
Similar type action for list columns as for string with
separate_rows
is to unnest
the list
column.
nettskjema_get_data(123823,
checkbox_type = "list") %>%
select(form_id, submission_id, starts_with("checkbox")) %>%
unnest(checkbox)
Form 123823 has 4 responses to download.
# A tibble: 5 × 5
form_id submission_id checkbox checkbox_matrix_1 checkbox_matrix_2
<dbl> <chr> <chr> <list> <list>
1 123823 16785801 1 <chr [2]> <chr [2]>
2 123823 16785801 2 <chr [2]> <chr [2]>
3 123823 16779763 2 <chr [1]> <chr [1]>
4 123823 16509317 1 <chr [1]> <chr [1]>
5 123823 16508664 1 <chr [2]> <chr [1]>
The last option is to return the data where each checkbox is a
column, with a binary indicator showing if the option was selected
(1
) or not (0
).
nettskjema_get_data(123823,
checkbox_type = "columns") %>%
select(form_id, submission_id, starts_with("checkbox"))
Form 123823 has 4 responses to download.
# A tibble: 4 × 8
form_id submission_id checkbox_1 checkbox_2 checkbox_matrix_1_1 checkbox_matrix_1_2
<dbl> <chr> <int> <int> <int> <int>
1 123823 16785801 1 1 1 1
2 123823 16779763 0 1 1 0
3 123823 16509317 1 0 1 0
4 123823 16508664 1 0 1 1
# … with 2 more variables: checkbox_matrix_2_1 <int>, checkbox_matrix_2_2 <int>
There is a gotcha with this last option. Currently, there is no way
to indicate values that should actually be NA
, i.e. if the
question is optional there is no way to know if lack of selection means
the item was explicitly not selected or someone just skipped the
question.
If you want a quick idea of what your data contains, we recommend
using the skim()
function from the {skimr} package.
── Data Summary ────────────────────────
Values
Name dt
Number of rows 4
Number of columns 18
_______________________
Column type frequency:
character 17
numeric 1
________________________
Group variables None
── Variable type: character ─────────────────────────────────────────────────────────────
skim_variable n_missing complete_rate min max empty n_unique whitespace
1 submission_id 0 1 8 8 0 4 0
2 attachment_1 1 0.75 11 37 0 3 0
3 attachment_2 3 0.25 21 21 0 1 0
4 checkbox 0 1 1 3 0 3 0
5 checkbox_matrix_1 0 1 1 3 0 2 0
6 checkbox_matrix_2 1 0.75 1 3 0 2 0
7 date 1 0.75 10 10 0 2 0
8 datetime 1 0.75 16 16 0 2 0
9 dropdown 0 1 1 1 0 2 0
10 freetext 0 1 3 9 0 4 0
11 number_decimal 1 0.75 3 5 0 2 0
12 number_integer 1 0.75 1 2 0 3 0
13 radio 0 1 1 1 0 2 0
14 radio_matrix_1 0 1 1 1 0 2 0
15 radio_matrix_2 0 1 1 1 0 2 0
16 slider 1 0.75 1 1 0 3 0
17 time 1 0.75 5 5 0 1 0
── Variable type: numeric ───────────────────────────────────────────────────────────────
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
1 form_id 0 1 123823 0 123823 123823 123823 123823 123823
hist
1 ▁▁▇▁▁
If you want a quick idea of data types and missing values, the
vis_dat()
function from the {visdat} package is a great
graphical tool.