Zygocity - Norwegian Twin registry questionnaire

Background

The Zygocity questionnaire was developed by the Norwegian Public Health Institute (FHI; Folkehelseinstituttet) for their twin registry studies. Its a series of questions probing the similarities between twins, to determine if they are mono- or dizygotic.

Scoring

Classification

This note contains a brief description of the algorithm used to determine zygocity in recruitment in the 2000s.

Name	Answer questions about…	Used for
Drop	You and your twin were like two drops of water in childhood	Pairs and singles
Stranger	Strangers had trouble telling the difference when you were children	Pairs and singles
Eye	Similarity in terms of eye color	Pairs
Voice	Similarity in terms of voice	Single
Dexter	Similarity in Dexterity	Pairs and Singles
Belief	What you believe yourself	Pairs and Singles

“Single” twins here means those who have responded alone, i.e. there is no data available for both in the pair. The similarity questions that are not found in the table above, e.g. whether or not family members had problems distinguishing the twins is not used in the classification.

Weights

During calculations of the entire zygocity score, weights are applied to the different categories, depending on whether one or both twins have responded to the questionnaire.

Name	Answer questions about…	Factor single	Factor pair
Drop	You and your twin were like two drops of water	1.494	2.111
Stranger	Strangers had trouble seeing the difference	0.647	0.691
Eye	Similarity in terms of eye color		0.394
Voice	Similarity in terms of voice	0.347
Dexter	Dexterity Similarity	0.458	0.366
Belief	What you believe yourself	0.417	0.481
	Constant term in the formula	0.007	- 0.087

Coding

“Form value” is the value the answer option has in the data file. “Score value” is the value used in the algorithm when zygocity is calculated.

Variable	Answer option	Form value	Score value
Drop	Like two drops of water	1	1
	Like most siblings	2	-1
	Don’t know	3	0
Stranger	Often	1	1
	Occasionally	2	0
	Never	3	-1
	Don’t know	4	0
Belief	Monozygotic	1	1
	Dizygotic	2	-1
	Don’t know	3	0
Eye, Voice & Dexter	Exactly the same	1	1
	Almost like	2	0
	Different	3	-1
	Don’t know	4	0

No answer option is used directly in the calculations, only the score values. In the following, it is these values (-1, 0 or 1) that are used in the algorithms. E.g. has Drop in the formula value 1 for a positive answer to whether the twins were equal to two drops of water.

Equation

The higher the absolute value of the final score, the more certain / clearer the classification. For answers that reveal greater uncertainty about the similarity (e.g. a greater proportion of “almost” and “don’t know”), the value will be closer to zero.

Pair formula

For pairs where both have answered, the pair’s average values for all score values are first calculated. That is Drop = (Drop1 + Drop2) / 2, etc., where Drop1 is the score value of the response from twin 1 and Drop2 is the score value of the response from twin 2 in the same pair.

zygocity = (\frac{drop{_1} + drop{_2}}{2} * 1.494) + (\frac{stranger{_1} + stranger{_2}}{2} * 0.647 ) + (\frac{dexter{_1} + dexter{_2}}{2} * 0.458) + (\frac{belief{_1} + belief{_2}}{2} * 0.417) + (\frac{voice{_1} + voice{_2}}{2} * 0.347) + 0.007

The sign of this “pair score” is then used to determine zygocity in the same way as for “single”: Negative value means double, positive value means single.

Single formula

If only one twin in the pair has responded, the following is calculated:

zygocity = drop{_1} * 2.111 + stranger{_1} * 0.691 + dexter{_1} * 0.366 + belief{_1} * 0.481 + eye{_1} * 0.394 - 0.087

The sign of this “single score” is then used to determine the zygocity: Negative value means double egg, positive value means single egg.

Data requirements

Column names

By default, the functions assume that columns have names in the manner of zygocity_XX where XX is a zero-padded (i.e. zero in front of numbers below 9, eg. 09) question number of the inventory. You may have column names in another format, but in that case you will need to supply to the functions the names of those columns using tidy-selectors (see the tidyverse packages for this). The columns should adhere to some naming logic that is easy to specify.

Data values

The values in the columns should be the item number of the question that was answered (i.e. 1, 2, or 3, and for some questions also 4).

Use the `zygo` functions

Currently undocumented…

library(questionnaires)
library(dplyr)
zygo <- tibble(
  id = 1:10,
  twinpair = rep(1:5, each = 2),
  drop = c(1, 2, 3, NA, 2, 2, 1, 1, NA, 2),
  stranger = c(1, 2, 4, NA, 2, 3, 3, 1, NA, 2),
  dexterity = c(1, 1, 3, NA, 2, 2, 1, 2, NA, 1),
  voice = c(2, 2, 3, NA, 2, 2, 1, 1, NA, 1),
  eye = c(2, 2, 2, NA, 2, 2, 1, 1, NA, 2),
  belief = c(1, 1, 2, NA, 2, 2, 1, 1, NA, 2)
)

zygo_compute(zygo, 
             twin_col = twinpair, 
             cols = 3:6, 
             recode = FALSE)
#> # A tibble: 10 × 8
#>    zygo_eye zygo_drop zygo_stranger zygo_dexterity zygo_voice zygo_belief
#>       <dbl>     <dbl>         <dbl>          <dbl>      <dbl>       <dbl>
#>  1    0.788      2.11         0.691          0.366     NA           0.481
#>  2    0.788      4.22         1.38           0.366     NA           0.481
#>  3   NA          4.48         2.59           1.37       1.04        0.834
#>  4   NA         NA           NA             NA         NA          NA    
#>  5    0.788      4.22         1.38           0.732     NA           0.962
#>  6    0.788      4.22         2.07           0.732     NA           0.962
#>  7    0.394      2.11         2.07           0.366     NA           0.481
#>  8    0.394      2.11         0.691          0.732     NA           0.481
#>  9   NA         NA           NA             NA         NA          NA    
#> 10   NA          2.99         1.29           0.458      0.347       0.834
#> # ℹ 2 more variables: zygo_score <dbl>, zygo_zygocity <chr>