# Sampling research/sample

The group that is the focus of the study (from the focus group) is called the **fundamental set** or **population**.

If all the statistical units of the population are to be studied the study in question is called **overall research**. If information can be gathered from all the units of observation (the population is not very big) it will be possible to carry out overall research.

If the population is large then only part of it is studied. A subsection of the population is called a **random sample** and all conclusions concerning the population are drawn from this sample, using **sampling research**.

- All units of observation are selected
**randomly** - All observation units selected must belong to the population
- All members of the population have an equal chance of being selected in the random sample

The sample must be representative. This means that the selected sample has the same features in the same proportions as in the population (e.g. same age and regional distributions in the sample and population). Since the results of the random sample will be generalised to represent the whole population, **statistical tests** are used to study the effects of chance on the results.

**The most common sampling methods** are:

- Simple random sampling
- Systematic sampling
- Stratified sampling
- Cluster sampling

**The size of the sample **

The sample size must be large enough to obtain reliable results. Loss must be taken into account (some questionnaires may not be returned and some completed forms may have to be rejected when not filled in properly). There are models indicating the ideal sample size, however the following list shows some reference values for a certain amount of answers.

The amount of answers received should be **at least**:

- 100 if the population is narrow and the outcomes will be studied at an overall level
- 200 – 300 if the research is focussing upon different groups within the population (e.g. a comparison); each group should contain at least 30 statistical units
- 500 – 1000 in a national consumer study
- 1000 in a study of support of political parties

Questionnaires

The person who compiles the questionnaire must be well aware of the aims and research problem of the piece of research before compiling the questionnaire. A good questionnaire requires time and effort. A well planned, clear and well thought out questionnaire is necessary when conducting good research.

**Instructions:**

- clear instructions for how to fill it in
- questions numbered in consecutive order
- questions that are in a logical order
- the questions can be grouped according to subject under headings
- one question asks about one issue, not more than one
- a sufficient amount of background variables that have been selected correctly according to the nature of the study in question
- when thinking about how to structure your questions you should already consider which statistical methods you are going to use (take into account the scale of the variables)
- ask for information so that the information on the form is easy to code and process using statistics software
- the form must not be too long
- the overall first impression of the form must be that it is clear, tempting and concise
- when providing ready answers – all possible answers should be proposed including an extra choice “other, what”.

Statements or answers where the answer selection is as follows:

I completely agree, I mostly agree, I don’t agree nor disagree, I mostly disagree, I disagree totally Excellent, Good, Satisfactory, Bad, very bad

Variables are often given numerical values 5, 4, 3, 2, and 1. If it can be assumed that the person answering the questionnaire believes the space between each choice to be the same, the graduated scale can be interpreted as an interval scale. In this case it is possible to calculate the average and standard deviation of the variable. The choice of answers should include “issue not known” in case the person answering the questionnaire is not familiar with the issue of the question. This answer is given a numerical value. When calculating parameters the “issue not known” answers should be removed.

Describing and analysing the material and interpretation and reporting of the results

The type of questionnaire used for a specific research problem depends on the statistical analysis methods that are used during the study.

**The presentation and examination of the results in the report: **

- description of the material
- the relationships and dependencies between variables
- Statistical conclusions
- The generalisation of the results in terms of the population (test of hypotheses)

Description of the Material in the Report

The distribution of values of the variables is often presented first in a report

- As a table
- Graph
- Text
- Using statistical numbers

The text and diagram (and text and table) compliment each other. The text interprets the most important points of the diagram/graph but does not list all the important values. When the distribution of the variables is concisely described using a few numbers, statistical terms are used such as: average, standard deviation, median, quartiles.

The results are presented according to topic e.g. background variables, informing etc. The distribution of variable values can also be examined within different groups of variables that are important to the research e.g. by gender, department, different age groups etc. Groups that fulfil certain conditions can also be examined e.g. fifth grade boys.

The relationships/dependencies between variables

When examining the relationships between variables the following can be used:

- Cross-tabling, (sometimes contingency coefficient calculation)
- Correlation coefficients
- Regression analysis

It is usual to examine the connection between two variables i.e. in pairs:

- If both variables are at least interval scale variables, their
**Pearson’s correlation coefficient**can be calculated to describe linear dependency. - If the variables are ordinal scale variables (or one is an ordinal scale variable and the other is at least an ordinal scale variable, it is possible to calculate
**Spearman’s rank correlation coefficient**. - If the variables are dummy variables, their relationship(s) is examined using cross tabling. The size of dependency in this context can be examined using a
**contingency coefficient**(not very common).

**When examining dependencies pay attention to deviating observations because they will affect the results.**

Statistical Conclusions and how to generalise the results in terms of the population

It is then necessary to consider the information obtained from the material in order to draw conclusions. Such conclusions are based on the sample if sampling has been used. Theses sampling outcomes can be used to describe general trends in the overall population. Before this you must check that the probability of chance affecting the outcomes (the dependencies between variables) or the differences between groups has been minimised. In order to check this you must use statistical testing.

Statistical testing is used to find out whether certain preconceptions, statements or hypotheses are applicable to a certain population. Hypotheses are necessary to the research because they are needed to solve the research problem. Different issues are tested using different tests. Before using a test you must check the conditions that are associated with their use.

**Test theory and examples of the most common tests: **

- Statistical testing
- Testing the relationships/dependencies between variables
- χ
^{2}independence test (cross tabling) - Correlation coefficient testing
- Regression line formation and testing
- χ
^{2}– compatibility test - Testing the normality of distribution
- Examples of average testing