-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathfct_harmonisation.qmd
247 lines (145 loc) · 7.86 KB
/
fct_harmonisation.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
# Food composition harmonisation and food matching
## Introduction
### Food matching
After the FCTs are standardised and harmonised and the food list (i.e., food reported as consumed in the Integrated Household Survey, Wave 5, 2019-2020), we can proceed to match them together. To do so, we are using a standardised list of foods that we called "food dictionary". As depicted in the @fig-matching.
![](images/food-matching_ani.gif){#fig-matching}
## Harmonising the Food Composition Tables
When all the FCTs are standardised, we can use them all together, which is particularly useful when food items and/or nutrient values are missing in the main FCT, then a similar food item could be found in another FCT.
## Getting the Nutrition Tools
NutritionTools is an R package of functions to help with a wide range of calculations and processes that commonly occur when working with nutrition datasets. More information can be found [here](https://tomcodd.github.io/NutritionTools/).
```{r}
if (!require("devtools")) {
install.packages("devtools")
}
devtools::install_github("TomCodd/NutritionTools")
```
### Food composition functions
There are some useful functions that can be downloaded [here](https://github.com/LuciaSegovia/UoN-FAO/tree/main/functions), and are currently being checked to be added to the NutritionTool package. Those can be loaded into the R environment by running the `functions.R` script.
```{r}
# We also need to import some custom functions in another script:
source(here::here("functions.R")) # Loading nutrition functions
```
The function `source()` will run the called script.
## Getting the FC Standardarised dataset
Download the following folders: KE18 & WA19 from [this GitHub repository](https://github.com/LuciaSegovia/fct/tree/master/FCTs). Eventually, you could download all of them an create a unique FC library.
Each folder has one script with the FCT id followed by *"_FCT_FAO_Tags"*, and a README file.
```{r}
# Finding the list of FCTs/FCDBs script in the data folder
list.files("data/", pattern = "*_FCT_FAO_Tags", recursive=FALSE, # so it is not looking into the subfolders
full.names=TRUE)
```
The next block of code will download and standardise each individual FCT. Then, the FCTs are merged into a common FC library that can be used for food matching.
```{r}
# Getting the list of FCTs/FCDBs script in the data folder
source_fct_name <- list.files("data/", pattern = "*_FCT_FAO_Tags.R", recursive=FALSE, # so it is not looking into the subfolders
full.names=TRUE)
for(i in source_fct_name){
source(here::here(i))
}
```
### Merging the data
Checking the FCT we have in our data folder
```{r}
# finding all the cleaned FCTs/FCDBs from the output folder
list.files("data/", pattern = "*_FCT_FAO_Tags", recursive=FALSE, # so it is not looking into the subfolders.
full.names=TRUE)
```
Now, we can merge all the FCTs/FCBDs into one file. Note that this is posible because they all have been standardised previously.
```{r}
# finding all the cleaned FCTs/FCDBs from the output folder
list.files("data/", pattern = "*_FCT_FAO_Tags", recursive=FALSE, # so it is not taking the fcts in the folder
full.names=TRUE)%>%
map_df(~read_csv(., col_types = cols(.default = "c"),
locale = locale(encoding = "Latin1")))
```
We can check that all FCTs that we are expected are there, by using the `source_fct` variable that is generated within the standardisation scripts, and the number of foods in each one.
```{r}
#checking that all FCTs are loaded and
# counting No. of items
data.df %>%
count(source_fct)
```
## Food matching
First, we need the list of unique foods reported as consumed. In HCES dataset, this is frequently presented as set of standard list of foods. We are also interested in knowing the frequency with each food is reported, and hence their impact and importance for subsequent analysis.
```{r}
# Read, subset and rename the data
ihs5_consumption <-
read_dta(here::here("data", "mwi-ihs5-sample-data", "HH_MOD_G1_vMAPS.dta")) |>
select(
case_id,
HHID,
hh_g01,
hh_g01_oth,
hh_g02,
hh_g03a,
hh_g03b,
hh_g03b_label,
hh_g03b_oth,
hh_g03c,
hh_g03c_1
) %>%
rename(
consumedYN = hh_g01,
food_item = hh_g02,
food_item_other = hh_g01_oth,
consumption_quantity = hh_g03a,
consumption_unit = hh_g03b,
consumption_unit_label = hh_g03b_label,
consumption_unit_oth = hh_g03b_oth,
consumption_subunit_1 = hh_g03c,
consumption_subunit_2 = hh_g03c_1
)
# Getting the food item list
ihs5_consumption <- hcesNutR::create_dta_labels(ihs5_consumption)
# Getting the food list & frequency of HH
food_list <- ihs5_consumption %>%
count(food_item_code, food_item_name)
```
Then, we will match those food items with their corresponding food dictionary code(s). There are instances were the matching will be one food reported to many foods in the FCT. For example, wheat flour will be matched to wheat flour refined, and wheat flour wholemeal.
```{r}
# Food dictionary
dictionary <- read.csv("https://raw.github.com/LuciaSegovia/fct/repro/metadata/MAPS_food-dictionary_v3.0.3.csv")
```
Then, the unique food dictionary codes (`ID_3`) will be used to match the food in the food list to the FCTs.
```{r}
# Matching
ihs5 <- read.csv(here::here("data", "fct_ihs5_v2.2.csv"))
```
::: callout-note
## Activity:
Thinking about food matches between HCES food list and food composition.
1. Do you think that the food matches selected represents well what the food reported as consumed in the survey?
2. If not, what other foods would you add/remove/change to?
3. For the foods without matches, can you think of what food items you would use?
4. Could you identify any food matches in the Food Composition Library that will be a good match for those foods that have no matches?
5. What nutrients are important for each of the foods selected?
:::
## Dealing with missing values
### Combining Tagnames to generate variables
### Re-calculating variables
Some varibles need to be recalculated, as part of the harmonisation process and also for quality assurance. One case is Energy (kcal/kJ) which is calculated from the proximate: Protein, Fat, available Carbohydrates, Fibre and Alcohol. Hence, we need to make sure that all these variables are reported and are completed. For instance, if there were missing values in Fat content, that the combination of Tagnames have been performed. In addition, if we are using Carbohydrate by difference, then we should re-calculate that variable as well.
```{r}
# Re-calculate variables:
data.df %>%
# Calculate available Carbohydrates, by difference
CHOAVLDFg_std_creator() %>%
# Calculate Energy (kcal)
ENERCKcal_standardised() %>%
# calculate Energy (kJ)
ENERCKj_standardised()
```
Another similar example is Vitamin A (RE/RAE), which is calculated from retinol and the carotenoids (i.e., Beta-carotene equivalents). Similarly, we need to check that those two variables. Note, that beta-carotene eq. is also re-calculated, when possible, from the carotenids and their conversion factors. Hence, first we should check that beta-carotene, alpha-carotene, and beta-crypoxanthin are available.
```{r}
# Re-calculate variables:
data.df %>%
# Recalculate beta-carotene eq.
CARTBEQmcg_std_creator() %>%
# Recalculate Vitamin A (RAE)
VITA_RAEmcg_std_creator() %>%
# Recalculate Vitamin A (RE)
VITAmcg_std_creator()
```
### Further Readings
<br>
- Greenfield, Heather, and D. A. T. Southgate. Food Composition Data: Production, Management, and Use. Rome: FAO, 2003.
- FAO/INFOODS (2012). FAO/INFOODS Guidelines for Checking Food Composition Data Prior to the Publication of a User Table/Database-Version 1.0. FAO, Rome'. Accessed 22 January 2022. https://www.fao.org/3/ap810e/ap810e.pdf.