Dataset: The Health of Houston Survey 2018


The Health of Houston Survey 2018 (HHS 2018) follows our 2010 household survey in providing updated and accurate information about the health of people living in Harris County and the City of Houston (the Houston area). Data from the survey are fully accessible on line and are intended to serve the needs of health agencies, health care service providers, local organizations and the community at large for scientifically-valid health information about the population. This information can be used to track emerging health issues, assess the impact of health programs, and document health improvements in the Houston area in comparable ways. HHS 2018 collected extensive information for multiple segments of the population on health status, health conditions, behavioral risk factors, cancer-screening services, insurance coverage and healthcare access, prenatal care and neighborhood conditions. The survey contains health and sociodemographic information for each adult respondent, selected randomly within a household in cases when landline was dialed, and one randomly selected child, in the case of households with children. Questions about children were asked only if the adult respondent was the parent, guardian or grandparent of the randomly selected child.

The HHS 2018 sample is representative of Harris County and the City of Houston's non-institutionalized adult population living in households and was designed to capture reliable data for a number of subpopulations: (1) overall population of Harris County and City of Houston; (2) each of the seven subcounty areas created by aggregating the American Community Survey (ACS) Public Use Microdata Areas (PUMAs) in Harris County; (3) main racial and ethnic groups (White, Hispanic, African American, and Asian); and (4) main age and income groups.

To achieve the above objectives in the most cost-effective way, the survey employed a dual-frame Random Digit Dialing sample design, using a combination of landline phones and cellphones. Data collection started in June 2017 but half-way to completion, in August 2017, it had to pause due to the devastation from Harvey-related flooding. The survey resumed again in February 2018, at which time the instrument was modified to include questions on how flooding impacted the lives of respondents, specifically about resulting health conditions, flooding and property damage, income and employment changes, evacuation, assistance, aid, resilience and recovery post-Harvey. To accommodate Harvey-related questions, a small number of questions that were not part of the core questionnaire were omitted and were not asked to the post-Harvey sample.


(1) A weight variable (e.g., FinalWgt) must be applied to all cross-tabulations to insure valid population inferences from the sample data. There are separate weights for the main sample and for the pre- and post-Harvey samples (see the Weighting section for more information). Weights are accessed in Nesstar through the scales icon on the task bar.
(2) The user must exercise caution in interpreting tabulation results that contain very small cell unweighted counts because they will produce unreliable estimates and are insufficient to support inferences.

To download the data please fill out a standard public use file data agreement form ( and send it to Study documentation/codebook can be downloaded prior to signing the PUDF agreement, by clicking the disc icon download in Nesstar, on the top right side of the screen.If you would like to receive updates and notices about dataset updates, please register at the survey website. The dataset is Version1.1, as of August 31, 2019.

Variable Groups

Document Description

Full Title

The Health of Houston Survey 2018

Identification Number


Authoring Entity

Institute for Health Policy, UTHealth School of Public Health


Name Affiliation Abbreviation Role
Institute for Health Policy, UTHealth School of Public Health The University of Texas Health Science Center at Houston IHP, UTSPH Principal investigators


Name Affiliation Abbreviation Role
ICF Macro, Inc. ICF International, Inc. ICF Survey contractors conducted survey design, data collection and weighting


Copyright © 2019 Health of Houston Survey, Institute for Health Policy, The UTHealth School of Public Health.

Date of Production



Name Affiliation Abbreviation

Full Title



Name Affiliation Abbreviation

Study Description

Full Title

The Health of Houston Survey 2018

Identification Number


Authoring Entity

Name Affiliation
Institute for Health Policy UTHealth School of Public Health

Date of Production


Funding Agency/Sponsor

Name Abbreviation Role Grant
Houston Endowment
Episcopal Health Foundation
Texas Children's Hospital
Memorial Hermann Health System
Community Health Choice/Harris Health System
UTHealth, President's Excellence Fund
UTHealth School of Public Health, Office of the Dean
Texas Medical Center Health Policy Institute

Data Distributor

Name Affiliation Abbreviation
UTHealth School of Public Health, Institute for Health Policy University of Texas System IHP


Name Affiliation Abbreviation



List of Keywords

Behavioral risk factors (smoking, second hand smoke, alcohol abuse, diet, physical activity), colorectal cancer, demographics (age, gender, race/ethnicity, country of origin, languages spoken at home, citizenship), economic hardship, employment, environmental risks, general health status, health and dental care access, health and dental insurance status, health conditions (obesity, diabetes, asthma, cancer, cardiovascular disease, hypertension), income, mammography, mental health access and utilization, mental health assessment, neighborhood, housing, pap test, prenatal care/breastfeeding.


United States of America (USA)

Geographic Coverage

The geographic coverage is Harris County, Texas, and those portions of the City of Houston that extend into Montgomery and Fort Bend counties.

Unit of Analysis

The units of analyses include individuals (children ages 0-17 and adults 18 and older).


The universe includes non-institutionalized individuals of all ages who reside in Harris County and the City of Houston.

Time Method

The data collection started in June 2017, but was paused at the end of August due to the Hurricane Harvey-related disaster. It resumed in February 2018, ending in May 2018.

Sampling Procedure

The Health of Houston Survey 2018 sampling goals were to produce valid and reliable estimates for the non-institutionalized adult population of Harris County and the City of Houston, as well as for seven, contiguous, sub-county areas. In addition, the survey was to produce estimates based on sociodemographic characteristics including race/ethnicity, age groups and poverty level.

The sample design was a stratified, list-assisted Random Digit Dialing (RDD) sample of landlines and cell phones, supplemented with an oversample to increase the number of Asian responses. Between June 2017 and May 2018, we carried out interviews via cellphone and landline phones, which were randomly selected in each sub-county geographical stratum. These strata followed the American Community Survey's Public Use Microdata Area (PUMA) boundaries, and divided Harris County into seven sampling areas (see our Survey Methodology at Close to 5,700 adult respondents, representing the entire adult population of Houston and Harris County answered questions on health, health insurance and care access, mental health, prenatal care, diet and exercise and neighborhood conditions, among other issues. Adult respondents also provided details on the health and health care issues of a randomly-selected child in the household. The telephone interviews were conducted in both English and Spanish languages.

Mode of Data Collection

Data collection relied on phone interviews using a combination of landline and cellphone dialing. The dialing protocols followed a suggested monthly interviewing schedule; all calls for a given survey month were completed in the same sample month if possible. In some cases, samples that began in one month were completed in the first 7-10 days of the next month. It was possible to make up to 15 calling attempts for each landline phone number and up to eight for each cell phone number in the sample. The call centers also changed schedules to accommodate holidays and special events, made weeknight calls after 5:00 PM CST, and adhered to respondents' requests for specific callback/appointment times, whenever possible. An eligible household was defined as a housing unit that has a separate entrance, where occupants eat separately from other persons on the property, and that is occupied by its members as their principal or secondary place of residence. The following were non-eligible households: group homes, institutions, and (in the landline telephone sample) households outside of the Houston area or Harris County, Texas.

The Health of Houston is a self-reported survey; if respondents reported that they live in private residences, the interviewers did not question them about their residence. Eligible household members included all related adults (aged 18 years or older), unrelated adults, boarders/roomers, live-in au pairs or students and domestic workers who considered the household their home, even though they may not be home at the time of the call. College housing residents were treated as single adult households. Household members did not include adult family members who were currently living elsewhere.


When estimating population statistics, weighting is often used to make a survey sample representative of the target population. The purposes of weighting survey data are to compensate for unequal probabilities of selection, to adjust for non-response and telephone non-coverage, and to ensure that results are consistent with the population and demographic data of the area that was surveyed. When properly applied, weighting can reduce the effects of nonresponse and coverage gaps on the reliability of the survey results. A survey weight is a value assigned to each case in the data file, in order to make statistics computed from the data more representative of the population. HHS 2018 survey weights were computed to correct for disproportionate sampling probabilities introduced by the sampling design, including unequal probabilities due to the dual-frame sample and Asian oversample, and to correct for differences in demographic characteristics of the sample versus the population.

Doing this reduces the risk of nonresponse and coverage biases in estimates and statistical analyses associated with these populations. When analyzing HHS 2018 data, it is important to use the correct weight. The HHS 2018 Public Use Data File (PUDF) is composed of three datasets: (1) the MAIN FILE that includes questions asked to all adult respondents and children and all sample (n=7210); (2) the PRE-HARVEY FILE, containing questions asked to the sample interviewed before August 2017, and pre-Harvey adult respondents only (n=2709); and (3) the POST-HARVEY FILE that contains Harvey-related questions asked to the sample interviewed six-nine months after Harvey and post-Harvey adult respondents only (n=2985). We provide three specific weight variables in each datafile. The reason for this is that approximately half of the survey data were collected before hurricane Harvey and half afterwards. After Harvey, we added questions about the effects of Harvey on the respondents and their households. In order to add these questions, but not increase the time required to respond to the survey, some questions were also removed from the post-Harvey questionnaire. Whether a question was added or removed due to Harvey is noted in the metadata for the variable related to that question, along with the correct weight that should be used in analyzing that variable. The three weight variables in the data file are FinalWgt (MAIN FILE), FinalWgt_pre (PRE-HARVEY FILE), and FinalWgt_post (POST-HARVEY FILE). FinalWgt should be assigned to all cases for those variables related to questions asked both before and after Harvey (to the entire sample). By far, these are the majority of variables. FinalWgt_pre should be used when analyzing questions asked only to those surveyed prior to hurricane Harvey. These respondents are identified by using variable 'pre_harvey', with value equaling '1' (Yes). Likewise, the weight FinalWgt_post should be used when analyzing solely variables based on questions asked to those interviewed after Harvey, who are identified in the 'pre-harvey' variable with a value of '0' (No).

We have also included a geographic stratum variable ('geostr'), which defines the sampling stratum for each adult respondent. Because the HHS survey uses a complex stratified probability sample, the final weight (e.g., 'Fnlwgt_Wgt') affects the calculation of the point estimate, and the stratification information 'geostr' affects calculation of standard errors. Consequently, analysis conducted with the PUDF file, using local statistical software, will require the use of both weight and stratum for accurate point estimates and standard errors. For analysis of variables from HHS2018_PostHarvey or HHS2018_PreHarvey with other variables from the main dataset, HHS2018 PUDF, all three dataset need to be downloaded, merged, and respective weight and geostratum used to produce weighted and accurate estimates. An example of how the HHS survey weights and strata are taken into account during analysis of the main file in Stata follows here: svyset [pweight=FinalWgt], strata(geostr). A careful examination and care should be applied when analysis is based in unweighted small cell counts, because they will yield estimates that are very unstable and unreliable.

PUDF data users should exert care during file merging, since some household and sociodemographic variables that are included in all three sets to facilitate quick tabulation of respective variables pertaining to these sets with these variables, will be redundant. From the repeated variables, it is best to keep those coming from the MAIN FILE because that dataset contains all adult and children respondents. The dataset is Version1.1, as of August 31, 2019.

Related Materials

2018 Phone (CATI) questionnaire

This is the phone questionnaire used in the 2018 Health of Houston Survey.

2018 Methods report

This is the 2018 Health of Houston Survey Methods Report

Mapping the Houston area by PUMAs (Excel File)

A crosswalk between Harris County ACS PUMAs numbers and their respective names.

HHS 2018 PUMAs map

Data Files Description

File Name


Overall Case Count


Overall Variable Count


Type of File

Nesstar 200801


Metadata Index

This is the Metadata Index for a Nesstar Server.
Nesstar is a tool used for analysing, visualising and downloading datasets.

Click the "Explore Dataset" button to open the dataset.