I stumbled upon N. Hermosa’s blog post here on an article on the economic burden of infant formula. His blog post title and comments struck me and I was curious about the article discussed. I had doubts almost immediately. In this post, I try to make sense of this article with the general audience in mind.

In a recent article (gated) in the Journal of Human Lactation (not a joke!) written by a team funded by the WHO and Action for Economic Reform , Sobel et al suggested that families who purchase infant formula pay more on education and medical care than families who don’t by a very wide margin. The main message for individual families seems to be that it makes more sense not to purchase infant formula and to choose breastfeeding instead. It makes more sense for government to advocate breastfeeding to unburden poor families.

When I first read this article, I immediately got a sense of the intended conclusion by the authors. The introduction has a discussion of the harmful effects of infant formula, the breastfeeding statistics and the incidence of diseases for both the US and the Philippines and the blaming of companies for the massive advertising of infant formula (Who wouldn’t forget the Promil kid?). You would expect that the authors would try to show that it costs more to be on an infant formula diet for families–both directly (through purchases of infant formula) and indirectly (through a shift in the expenses from all other goods to infant formula).

Sobel et al used data from the 2003 Family Income and Expenditure Survey (FIES) and Labor Force Survey (LFS) of the NSO. The section on data sources contains a curious feature of these two surveys–the LFS (!) was used to identify  families with children under 2 years old. They focus on poor families (families earning less than $2 a day) with at least one young child (under 2 yrs old).

Given their problem, you would expect a statistical test comparing families buying infant formula and families who don’t. They used a two-sample t-test  to determine if there is a difference in certain types of expenditures. For instance, do families who buy infant formula have more medical expenses? So the purchase of infant formula is a treatment (like in experiments). The table below (see the row with Difference) shows that families who buy infant formula have more medical expenditures (in absolute and relative terms) and the difference is statistically significant (check the confidence intervals). They take this as the main evidence for their main message. This is one of the 2 tables available in the paper.

Can the results be trusted? There are some points that may confuse and distract the reader:

There seems to be a strong prior assumption that families that do not use infant formula have lesser expenditures. They never gave the data a chance to prove that they might be wrong. (See the discussion below.) In fact, their conclusions are unwarranted given the data they have used:

Increased use and purchase of infant formula has short- and long-term health implications for infants and young children and economically burdens families. Poor families may buy infant formula instead of investing in education or medical care. It is imperative for all governments to increase support for the promotion, protection, and advocacy of breastfeeding.

There are no calculations for the future burden on families using infant formula. Health status of the infant is also not observed over time. What would be better is a dataset that allows one to track families (using or not using iinfant formula) and infants along with the corresponding sicknesses and expenditures. The second sentence is true regardless of whether the family is poor or rich. Given a fixed budget, one really has to forgo something to get something else. By how much is not all clear. Where did breastfeeding come from in the third sentence? The data available only indicates whether a family is a formula buyer. I am not sure if formula buyers and breastfeeders are mutually exclusive. Besides with the evidence they present, it is not enough to justify government intervention. In fact, I am wondering (pardon the ignorance) why most families do not breastfeed at all if it is virtually free. Some women might be incapable of breastfeeding. Some children might be lactose intolerant. Some women choose to spend more time at work than at home with the infant. These women would choose to use infant formula! Therefore, there is an unobserved and unaccounted factor in the analysis!

The remaining table in the paper shows estimates (not actual!) of annual expenditure of families on medical care, education and infant formula. They do acknowledge the limitations of the study in the last page of the article.

These economic estimates did not include other costs that may result from use of formula, such as lost wages caring for ill children, lost earnings from companies having to hire replacements during absenteeism, costs to governments in building more health facilities, or health worker time to deal with the added burden of illness. It did not consider human cost such as the vastly increased risk of death, chronic disease (eg, diabetes, child obesity, maternal breast cancer), lower IQ, decreased school performance, or employment potential. In short, the burden calculated here is probably conservative.

One thing surprising about the table above is that they made the same calculations for all families purchasing formula not all other families, i.e. the table should have two factors, poor or not and infant formula buyer or not. This would have made more sense (at least for medical care and educational expenditures).

These expenditures were also calculated for the whole family not for the infant alone. This is crucial because the family included in the dataset can have more children. Do poor families have more children? Possibly. What drives the results might be the number of children!

There was also a mention of multilinear regression. (I am not exactly sure what multilinear regression is but my Google search tells me that it is just linear regression with more than one explanatory variable. ) This part really grinds my gears. They say,

Multilinear regression was conducted predicting expenditure on medical care and expenditure on infant formula adjusting for total family income, place of residence, total amount spent on food (excluding formula), and education. Initially, all non-milk family expenditures were included. The final model included only those with P < .05. [emphasis mine]

So they wanted an estimate of expenditures “adjusting” (or “controlling”) for various observable characteristics of families. It is possible I read it wrong but they chose the wrong estimates! They should have used the “residual” after controlling for these expenditures! They should have subtracted the predicted expenditures from actual expenditures to get that part of expenditures “unexplained” by observable characteristics. I am also worried about the last two sentences because they do not disclose what has been done “in the kitchen”.

To add to the confusion, they later state that, “Using multilinear regression, after adjusting for total family income, the average Philippine family with young children spent an additional $0.30 (95% CI: 0.24-0.36; r^2 = 0.08) on medical expenses for every $1 spent on formula. After adjusting for place of residence, total amount spent on food (excluding formula), and education, the value changed to $0.29 (95% CI: 0.23-0.35; r^2 = 0.08).”

It turns out that they first computed a regression of medical expenditures on infant formula expenditures! So infant formula expenditure is not the dependent variable, contrary to the centered quote above. They then computed a regression where adding some more variables does not change the estimate so much. How did they compute the remaining parts of Table 1 then? Are these just averages of actual expenditures? I actually thought they  computed some adjusted form of medical expenditures to be used in the two-sample t-test in Table 2.

Clearly, they are very confused with their methods. They could have included an indicator for formula buyer into the multilinear regression and used the estimated coefficient to make their point! This coefficient, under certain conditions, can be used to say something about the average difference in medical expenditures between families who buy formula and families which do not. Whether it is a good point has not been demonstrated thoroughly because of the unaccounted factor in the analysis (discussed above) and why current medical expenditures were used instead of future expenditures attributable to the infant! One last misleading point: They use the term average Philippine family where in fact they mean Philippine family, on average!

On the basis of the article alone, I have huge doubts about whether the research problem was thoroughly addressed.