Access provided by
London Metropolitan University
• To understand the step-by-step process involved in developing a self-administered questionnaire
• To identify strategies to improve the styles of questionnaire items and comprehension, and to reduce acquiescent bias
• To promote best practice when establishing the face, content, construct and internal consistency validity of a self-administered questionnaire
Background Using a structured process to develop a self-administered questionnaire provides a robust tool for collecting data that enhances the credibility of the results. Describing this process mitigates any complexity and confusion for the nurse researcher which can be generated by many sources of information that either lack detail or have complex statistical approaches.
Aim To discuss the development of a self-administered questionnaire with a focus on face, content, construct validity and reliability testing.
Discussion Adopting a well-established, sequential, five-step approach ensures that important concepts of questionnaire development are addressed: assessing existing tools and qualitative data, if available; drafting of the questionnaire with consideration for question styles, comprehension, acquiescent bias and face validity; expert panel review to establish content validity and inter-rater reliability; pilot testing to assess construct validity; and exploratory factor analysis to establish reliability testing. This approach results in a robust and credible tool for collecting data.
Conclusion This article provides nurse researchers with a structured process for developing self-administered questionnaires.
Implications for practice Investing time and effort to assess a newly developed questionnaire for validity and reliability and consider question styles, comprehension and acquiescent bias results in an improved and strengthened tool for collecting data. This in turn enhances the quality and credibility of a study’s findings.
Nurse Researcher. doi: 10.7748/nr.2022.e1848
Peer reviewThis article has been subject to external double-blind peer review and checked for plagiarism using automated software
Correspondencerebecca.leon@health.nsw.gov.au
Conflict of interestNone declared
Leon RJ, Lapkin S, Fields L et al (2022) Developing a self-administered questionnaire: methods and considerations. Nurse Researcher. doi: 10.7748/nr.2022.e1848
Published online: 31 August 2022
Self-administered questionnaires are widely used in nursing research either as a single tool for collecting data or as part of a collection of tools. A well-developed and validated questionnaire provides a highly effective, inexpensive and efficient method of collecting information such as knowledge, beliefs, attitudes and behaviours (Timmins 2015).
However, significant methodological errors are common in published research, despite many publications and sources of information providing methodological guidelines for developing questionnaires (Chiarotto et al 2018). A factor contributing to methodological errors is that most publications do not provide a clear process for researchers who want to develop a self-administered questionnaire (Timmins 2015, Younas and Porr 2018). The details required to critique a questionnaire’s applicability to a study are also often unavailable. This is unsurprising, considering the development of a questionnaire is a time-consuming, iterative process, so important methodological steps are either overlooked or poorly reported. When authors provide specific details, there is a tendency for them to focus on complex statistical approaches (DeMars 2018), which can be daunting for nurse researchers or nurses seeking a pragmatic approach.
This paper provides a structured process that was used to develop a self-administered questionnaire in a study exploring the experiences, perceptions and expectations of the enrolled nurse (EN) role in the Australian nursing workforce. The study focused on the Australian context, but the methods used can be applied internationally. It is hoped this article raises awareness of important considerations that enable nurse researchers to develop valid and reliable questionnaires for collecting data.
• A structured process that incorporates well-established processes is necessary to enhance the quality of a self-administered questionnaire
• There are different methodological considerations, so the researcher should use an approach that is suitable for the intended purpose
• Questionnaire development is an iterative process and requires continual improvement with different samples and settings
A self-administered questionnaire is a systematic method of capturing information at a given time from a specific population (Lapkin et al 2012). It is also the method behavioural and social sciences researchers prefer, as it considers psychological and social phenomena that cannot be measured through observation (DeVellis 2017). Time and effort invested in developing a questionnaire are rewarded by a strengthened tool that enhances the quality and credibility of research findings.
Important concepts in the development of a questionnaire are the assessment of face, content and construct validity, and reliability testing. Other factors to consider include question styles, comprehension and acquiescent bias.
For the study described in this article, a self-administered questionnaire was considered the most efficient and cost-effective way to collect data from a large geographical area. The questionnaire was developed as a component of an exploratory mixed-methods study. The benefit of this design was the sequence of the study’s phases, with the qualitative phase conducted first. This enabled the main themes, language and context to be used to inform the development of the self-administered questionnaire.
The structured process to develop a new questionnaire was adapted from DeVellis (2017) and Younas and Porr (2018). It comprised five main steps:
1. Preliminary considerations, including assessing existing tools and qualitative data, if available.
2. Draft the questionnaire.
3. Review by an expert panel.
4. Pilot the questionnaire.
5. Reliability analysis.
Table 1 includes examples of the application of these steps to the study.
(Adapted from DeVellis 2017)
To ensure terminology was consistent, definitions from the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) taxonomy (Mokkink et al 2016) were used through the development of the questionnaire (Table 2).
(Adapted from Mokkink et al 2016)
The steps are detailed as follows:
The use of a previously developed, reliable and validated questionnaire is preferred, as it ensures findings are credible, builds on existing knowledge and enables the findings to be generalised to other populations and settings (Timmins 2015).
A questionnaire should only be developed if no existing questionnaire is available or those that are available are inappropriate.
It is therefore important to start with a comprehensive search of the literature to determine if a questionnaire can be adopted for the study. The main considerations include relevance to the study’s objectives, clarity of the items and evidence of validity (Beatty et al 2019). If a questionnaire or part thereof meets the needs of the study, permission to use it must be sought from the original authors.
The construct of interest and purpose of the questionnaire has been determined by this point. It is important here to focus on the construct, rather than specific content. This ensures the researcher does not eliminate items too early, which would risk losing the essence of the construct.
The first draft should be comprehensive enough to address the research’s aim and objectives. A strategy to achieve this is to write statements in varying ways, then subtly change words and phraseology. This enables different perspectives to be captured in the items.
The challenge of the first draft is not to focus on quality or clarity or be restrictive. An ideal number of items is three to four times the final number (DeVellis 2017).
The next step is to refine the items into questions, with consideration to the most appropriate question style. Different question styles elicit different information, so it is important to ensure the response options address the research’s aim. Common options include multiple choice, Likert scale and free text questions.
Three aspects need to be incorporated into this step:
This can affect the quality of the data. When writing questions, seven potential comprehension problems have been identified: grammatical ambiguity, excessive complexity, faulty progression, vague concepts, vague quantifiers, unfamiliar terms, and false inferences (Tourangeau et al 2000). These problems can be avoided by using terminology and language identified in the relevant literature, and if available, incorporate the language, phrases and context from the qualitative data.
This is when participants tend to agree with the question or statement, regardless of its content. It can be minimised using a combination of positively and negatively worded items (Groves et al 2009), and by providing an even number of response options in Likert scales.
However, there is increasing evidence that challenges the use of negatively worded questions, such as: ‘I feel ENs are not a valued member of the nursing team.’ Negatively wording questions increases the complexity of their grammar and decreases their readability, which diminishes their potential advantages (Suárez-Álvarez et al 2018). Careful consideration is therefore needed when determining which questions are the most suitable to be negatively worded.
This is the degree to which a questionnaire appears to measure the construct of interest. Consideration as to the appropriateness of assessing for face validity needs to be made relevant to the construct of interest. It is assessed by reviewing the questions against the research’s aim and objectives (DeVellis 2017).
Assessment of content validity and inter-rater reliability by an expert panel will increase the level of trustworthiness in the results. This step provides another opportunity to receive feedback on the comprehension of the questions. It is important that the expert panel represents the main population of interest. Each panel member is provided with a feedback toolkit that includes some demographic questions so the panel can be described, the draft questionnaire, a rating scale and instructions. The panel is asked to use a scale of ‘low’, ‘medium’ and ‘high’ to rate each question on how relevant it is to the research aim, its clarity, its conciseness and whether it is ambiguous. It is also asked to identify any repetition, redundancy or omissions (Willis 2020).
A content validity index (CVI) for each individual item (I-CVI) and an overall scale CVI (S-CVI) are then calculated. For six or more panel members, the recommended I-CVI is 0.80, which demonstrates universal agreement (Polit and Tatano Beck 2006).
The questions should be revised and adjusted according to the CVI results. It is recommended that if there are significant changes to the questions in this step, the questionnaire be reviewed by the expert panel again. Although this step may be perceived as delaying the study, it provides assurance about the relevance and clarity of the final questionnaire.
The structure and process of the expert panel also assesses for inter-rater agreement (reliability) through the consistency of the panel members’ responses and ratings for each item.
Piloting the questionnaire assesses it for construct validity and is conducted with a representative sample of the population of interest. Pilot participants are provided with the invitation, the participant information sheet and the questionnaire. They are asked to work through the questionnaire and provide feedback on the method of distribution.
This step provides another opportunity to capture any inconsistencies or concerns about the questions and the instrument itself, as well as any excessive complexity, vague concepts and faulty progression (Tourangeau et al 2000).
Most online survey distribution tools measure the time each participant takes to complete the questionnaire. Potential participants of the main study can be provided with the average time taken by the pilot participants, giving them an informed understanding of the time required to participate in the study, which may assist with recruitment.
Exploratory factor analysis (EFA) is considered the most appropriate method of establishing the reliability of self-reporting questionnaires (Williams 2010). The aim of this step is to reduce the number of items into clusters of interrelating items and evaluate the internal consistency reliability of the questionnaire. Data required for this procedure are obtained after the pilot test by administering the questionnaire to a large sample that is representative of the population of interest.
Reliability analysis is a complex, multivariate analysis procedure that requires the use of statistical software such as SPSS, SAS, Stata and R.
The first step is to test the appropriateness of the data for factor analysis by determining sampling adequacy and verifying that the items are sufficiently intercorrelated. As a general rule of thumb, an absolute minimum for undertaking reliability analysis is 100 respondents (Mundfrom et al 2005). Additionally, the suitability of the data for factor analysis is based on the values of the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy and Bartlett’s test of sphericity. Values of KMO range from 0 to 1, with a KMO greater than 0.5 and a statistically significant X2 value for the Bartlett’s test of sphericity (p < 0.05) sought to justify the use of EFA (Williams 2010).
The next step is to extract factors with the aim of explaining the maximum amount of common variance using the smallest possible number of explanatory constructs. Principal components analysis (PCA) is considered the most psychometrically sound procedure for this process, particularly when no priori factor structure exists (Egbert and Staples 2019).
An initial step is to examine the correlation coefficients, with loading values lower than 0.30 suggestive of multicollinearity – a situation where the variables are correlated with each other or the dependent variable, resulting in a less reliable questionnaire (Alin 2010).
Kaiser’s criteria (eigenvalues > 1 rule), the scree test, the cumulative percent of variance extracted and parallel analysis are used to determine the number of factors to retain (Braeken and Van Assen 2017). In most cases, it is necessary to perform several iterations of the PCA, with item reduction achieved by assessing the pattern matrix for items loading poorly onto the extracted factors. A decision can be made, for example, to only retain coefficients if they are equal to or greater than 0.5 and to discard items that cross-load onto two or more factors. The face validity of each of the items loading onto factors during the process must also be assessed.
Cronbach’s alpha (↑) is then used to assess the internal consistency reliability for each item of the subscale and the total scale. Values greater than or equal to 0.7 are considered acceptable with those less than 0.5 unacceptable (Kılıç 2016). Each item of the subscale must be reviewed to determine whether Cronbach’s alpha for the subscale would be substantially improved if the item were deleted.
The last step is to interpret the results and label the identified factors with meaningful names or themes that reflect the theoretical or conceptual intent.
The following section details the application of the structured process in the development of the self-administered questionnaire for the study into the EN role in the Australian nursing workforce.
The literature search identified several studies that used a questionnaire or cross-sectional design to examine issues related to the EN role in the Australian nursing workforce. Further analysis determined none of them used a questionnaire that was relevant or could be adapted for the study. Therefore, developing a questionnaire was warranted.
Findings from the qualitative phase informed the first draft, with the terminology, questions and statements captured directly from the focus groups. An initial pool of 106 items was developed and further refined to a final questionnaire of 49 items: 23 multiple choice questions, 20 Likert scale questions and six free text questions. The analysis of the free text responses was managed in line with the qualitative data; therefore, the challenge of analysis was outweighed by the benefits of enriching the data.
To minimise acquiescent bias, any Likert scales that sought an opinion – for example, ‘I feel ENs are not a valued member of the nursing team’ – were given definitive choices through an even number of response options. Response options were: ‘strongly disagree’, ‘disagree’, ‘agree’, and ‘strongly agree’.
Conversely, if the statement related to something that participants may not have experienced – for example, ‘ENs are being rostered in place of registered nurses, because they are cheaper’ – a midpoint option was provided. Response options were: ‘never’, ‘sometimes’, ‘unsure’, ‘mostly’ and ‘always’.
Careful consideration was given to the incorporation of negatively worded questions, with only two questions written in that structure. It was decided rewording other questions negatively would increase the complexity of the question and grammar structure too much, making the questionnaire less comprehensible.
Once the questionnaire was drafted, the questions were mapped against the construct of interest, research aim and objectives. Together with a review by the research team, this ensured the questionnaire would capture responses that on ‘face value’ met the needs of the study.
Six expert panel members were identified based on their expertise and involvement with ENs, their diverse roles in health services and their geographical locations. Their responses in relation to the relevance and clarity of the items were used to calculate I-CVI and S-CVI. The areas of conciseness and ambiguity were not used in the calculations but were still valuable, as they assisted panel members to determine and construct their thoughts and opinions about the questions and the questioning styles. This method is widely used in nursing research (Polit and Tatano Beck 2006).
The I-CVI responses for relevance ranged from 0.75 to 1.00, with a mean score of 0.97. Three questions scored below 0.80. These were reviewed and rephrased accordingly.
Table 3 shows as an example a question, score, the panel’s response and the researcher’s response.
The I-CVI responses for clarity ranged from 0.25 to 1.00, with a mean score of 0.89. Nine questions scored below 0.80. These were reviewed and rephrased accordingly. No repetition, redundancy or omissions were identified.
Table 4 shows as an example a question, score, the panel’s response and the researcher’s response.
The consistency in responses from the expert panel indicated inter-rater reliability. The final questionnaire was drafted and built in the survey instrument, Survey Monkey.
Twelve participants were emailed a link to the questionnaire with a request to complete it as per the instructions and to report any concerns in relation to functionality and flow. It was during this process that one health service’s firewalls were found to restrict the ability to open the link. This concern was rectified by contacting the relevant IT department. A benefit of this barrier was the instructions and subsequent process in the email invitation to request a hard copy were tested and required no modifications.
Participants took an average of 10 minutes to complete the questionnaire. This information was included in the study’s participant information sheet.
There were no changes to the questions and responses as a result of piloting the questionnaire. It was therefore finalised, ready for distribution.
Participants (n=253) who completed all 20 Likert scale questions were used for reliability analysis. Sampling adequacy was acceptable, with a KMO measure of sampling adequacy of 0.68 and Bartlett’s test of sphericity reaching statistical significance (X2 =2,289.60, p<0.001). This demonstrates the data were suitable for factor analysis.
The first unrotated principal component revealed seven factors with an eigenvalue greater than one, accounting for 70.86% of the variance in the correlation matrix. The point of inflexion on the scree plot was consistent with the seven-factor solution.
Through several iterations of PCA, promax (oblique) was conducted. Analysis of the results revealed three items on factor 6 had negative average covariance and so they were excluded from further analysis.
The final iteration resulted in a 15 item, five-factor solution accounting for a cumulative variance of 74.12%. In this solution, each factor only retained items that fitted the specified criteria and all had loading greater than 0.60.
Analysis of the internal consistency showed moderate to high reliability in all factors: ↑ was 0.85, 0.79, 0.93, 0.89 and 0.59 for the five factors and 0.64 for the total scale. Table 5 provides details of the loadings of each of the items on the correlation matrix, the mean responses for each item, their labels (names) and Cronbach’s ↑ for each subscale.
It is recommended to use or refine an existing questionnaire rather than develop a new one, to ensure findings are credible (Timmins 2015). However, if one does not exist, using a structured process to develop a questionnaire, as described here, ensures important concepts such as comprehension, question styles and acquiescent bias are addressed. Success involves striking a balance between reducing the burden on participants and collecting comprehensive data.
In this study, content validity was established by quantifying responses from the expert panel and calculating the CVI (Polit and Tatano Beck 2006. These ratings were used to determine the accuracy, clarity and appropriateness of the items. There are other methods for assessing content validity – the key is to find one that is available to the researcher and provides a level of confidence in the calculations.
The benefit of pilot testing is highlighted by the fact that issues related to firewalls were identified and rectified before administering the questionnaire. This was important in enhancing the credibility and dependability of data collection.
Reliability testing was established by using EFA to identify the number of constructs and the underlying factor structure. This resulted in a parsimonious, 15-item, five-factor questionnaire with adequate internal consistency and construct validity. Reliability testing is an important step in developing questionnaires, as it provides the means to establish if the number of factors found in one sample can be replicated in another sample based on the same or across different populations.
Additional time and effort are required to follow the sequential and linear steps necessary to develop and validate a self-administered questionnaire. While this may seem unnecessary and unappealing, the final product is a questionnaire that captures more valid and reliable data. This reinforces the value of this time. Using a structured process assists in accommodating the additional work, as it prepares and guides the researcher through the steps required.
This article is limited to the development of a self-administered questionnaire. It does not consider issues related to further psychometric analysis of measurement properties including Rasch analysis that are informed by classical test theory and item response theory. Questionnaire design is also an iterative process and further refinements could be made in future studies.
Self-administered questionnaires are a common method of collecting data in nursing research, but their development can be daunting. It is hoped that this article’s detailed description of a structured process supplemented with an example will support researchers to work through the development of a self-administered questionnaire. The investment of time and effort is rewarded with a robust tool that enhances the credibility of a study’s results.
Exploring the role and expertise of ward-based oncology clinicians
Aim The aim of this study was to involve ward-based band 6...
The biology of cancer
Cancer research is moving fast. Understanding of the biology...
Making sense of cancer nursing research design
There is a bewildering array of research designs and...
Informing cancer patients: Truth telling and culture
Truth-telling about life-threatening cancer illness is a...
The value of a hospital palliative care team to other staff
A hospital palliative care team (HPCT) aims to provide...