SW 132Research and Evaluation for Social Work Practice
Schram
Fall 2001
Key Points for 10/23/01

1.  Most research, particularly survey research, involves sampling. A sample is a subset of the population the researcher wants to study.

2.  The two main types of sampling are probability and nonprobability sampling. Probability sampling is where the sample is selected in such a way that it is possible to estimate the probability that each element, unit of analysis or person being studied has of being represented in the sample. Nonprobability sampling is where the sample is selected such a way that it is not possible to estimate the probability of each element, unit of analysis or person being represented. Probability samples are for this reason preferred but they are not always feasible.

3.  Probability sampling enables the researcher to estimate the probability that the sample selected is representative of the population. All probability samples are designed to ensure that each element, unit of analysis or person in the population being studied has an equal chance of being selected in the sample.

4.  This is the principle of random sampling. A random sample is a sample where everybody in the population being studied had an equal chance of being selected in the sample.

5.  Sampling is done according to the following logic and steps using the terms specified:

(a) A population,such as all clients of a particular agency, is specified for study. A study population, such as all clients currently using the services of the agency, is defined for the purposes of selecting a sample or subset of the population.

(b) Each client to be studied is an element of the sample. In most cases, the sample elements, in this case clients, will also be the actual sampling units which are selected and will also be the observation units which will be actually interviewed. The client will also be the unit of analysis later when the data are analyzed. Most studies are like this where the people (i.e., elements) being selected (i.e., the sampling units), are also the people being interviewed (i.e., observation units), and are also the people being studied (i.e., units of analysis). Yet it is possible to imagine a study where the people being studied, are different from those selected (anyone who comes to the door at each house), who are different than the people being interviewed (the head of the family), which is also different than the unit of analysis (the family overall). So it is good to distinguish the element, the sampling unit, the observation unit, and the unit of analysis, though all are often the same, as in a study of clients.

(c) A sampling frame is an actual list of the sampling units, such as a client roster, from which a sample can be selected.

(d) A variable is a measurable item, such as a trait, characteristic, opinion or attitude which varies for the units being studied, such as the age, race, gender, levels of satisfaction, etc. of clients.

(e) A parameter is the statistical description of a variable for the population, such as 34.4 is the average age of the clients at the agency.

(f) A statistic is the statistical description of a variable for the sample, such as 33.3 is the average age of the clients in the sample. Statistics estimate parameters. Probability samples when done properly enable us to estimate the probability that a statistic is a good estimation of a parameter.

(g) Sampling error can only be estimated for probability samples and is represented in terms of how high a confidence level one can have about a sample and how wide a confidence interval one can have for statistical estimates made from that sample.

6.  Random selection is where each element of the population has an equal chance of being selected in the sample. The principle of random selection must be maintained if we are to have a probability sample that can be used to estimate the probability that our statistics are representative of parameters.

7.  As we might suspect, the larger the sample, the higher the probability our sample is representative. For random samples, the sample statistics that estimate population parameters will have a higher probability of being representative the larger the sample.
8. The larger the sample the higher the confidence level and the narrower the confidence interval which is to say the higher the probability that the sample statistical estimate is representative of the population parameter. The larger the sample the smaller the standard error.

9.  The standard error is the basic statistic used to calculate the confidence level and confidence interval. For purposes of simplicity, the process of estimating the standard error and thereby the representativeness of a sample often can be best demonstrated is we assume we are trying to estimate the parameter for a variable where half the population has a trait or exhibits a certain opinion and the other half does not, say where half the clients are unhappy with their treatment and the other half are happy. A sample on a variable that has only two values is called a binomial sampling distribution. The formula for the standard error for any binomial sampling distribution is s= the square root of (P x Q)/n, where s=the standard error, P=the percent for and Q=the percent against, and n=the number of people in the sample. If we sample 100 people and 50% said yes they were happy with treatment and 50% said no, then the formula above indicates that the standard error is .05 or 5%. According to probability theory, 68 percent of all samples always fall within plus or minus one standard error from the parameter, 95 percent within two (or in this case plus or minus 10 % or that is to say 40 to 60 percent). According to this logic, we can estimate the confidence level and interval for our sample. So we can estimate that for a binomial distribution where half are happy and half are not, that for a sample of 100 we have a 95 percent confidence level that the estimate is within plus or minus a 10 percent confidence interval. The formula for other distributions other than the binomial one is similar, if more complicated. The principle is the same.

10. We can turn this around. We can start with the assumption that we want a 95 % confidence level for a given sample size. This is the most common standard. Then we can look at a table like Table 9-2 on page 268 of Rubin and Babbie which presents the sampling error at the 95 percent confidence level (i.e., plus or minus 2 standard errors from the parameter) for different sample sizes. We can see that for the binomial distribution we would have to sample 1100 people before we could be 95 percent confident that our estimates fall within plus or minus 3 percent of the parameter.

11.  Remember probability samples only enable you to estimate the probability of the sample being representative. 95 percent confidence means that 5 percent of the time you sample is off to some unknown degree. You can only say that 95 percent of the time it is within the specified range. And you should also remember that you can only say this if the sample adhered to the priniciple of randomness. Probability samples estimate probability that the sample is within a range of being representative, they do not guarantee accuracy.

12.  No matter how a sample is chosen, if it is chosen consistent with the principle of random sampling and each element, unit or person has an equal chance of being selected in the sample, then you can estimate the probability that the sample is representative. The most direct way to maintain the principle is to draw units at random in a simple random sample , using for instance a random number table to assign numbers to each person randomly and then select them. Often because we lack a sampling frame, we can not do a random sample. This also means that we can not do a systematic sample where every , fifth or seventh or nth, person is selected. Systematic samples, can if done properly, maintain the principle of randomness. Stratified sampling can also be helpful. A stratified sample where the population is broken into identifiable groups and then sampling within groups proceeds. Often other more complicated techniques are used such as a multistage cluster sample, where first broad clusters of units, such as census tracts are selected, then blocks within that, then houses within that. These techinques are still probability sampling if they maintain the principle of randomness and each person in the population has an equal chance of being selected as an element in the sample.

13.. Nonprobability samples do not maintain the principle of randomness and we can not estimate their probability of being representative. These include: purposive samples where we select people based on a judgement as to their relevance to our study, quota samples where we select a sample that has the same proportions for key groupings as the population, such as the same racial or ethnic breakdown, availability samples where the people available are selected, coincidence or accidental samples where the people who just happen to pass by or show up are selcted, and snowball samples where initial people selected recommend others to be selected. There are times when these samples may not only be the best you can do but that they are a legitimate basis for conducting exploratory research even if you can not really use such samples to make statistical estimates of population parameters.

  Return to SW 132 Syllabus