Validity software engineering




















Previous Differences between Verification and Validation. Recommended Articles. Article Contributed By :. Easy Normal Medium Hard Expert. Writing code in comment?

Please use ide. Load Comments. What's New. Most popular in Software Engineering. More related articles in Software Engineering. Do internally valid studies have any value?

Should we replicate more to address the tradeoff between internal and external validity? We asked the community how empirical research should take place in software engineering, with a focus on the tradeoff between internal and external validity and replication, complemented with a literature review about the status of empirical research in software engineering.

We found that the opinions differ considerably, and that there is no consensus in the community when to focus on internal or external validity and how to conduct and review replications. Besides, I have met very few researchers who remember the name of the first category and what it comprises, i.

Sadly, we also lose many important subcategories of construct validity that should be taken into account throughout any research project or program, especially the ones promoting validation with real data. Such studies are of course demanding and require a lot of resources, which imply that they take time. Therefore, I do not agree that as many validity threats as possible should be listed at the end of each publication together with statements of how they were mitigated, since such a Utopian study does not, and will never, exist.

Instead, I suggest that the method section should be reworked until it is as clear as can be, in line with how Jedlitschka and Pfahl suggest controlled experiments should be designed, however, these do not have to be explicitly stated as validity threats since that is obvious.

With a well-written method section, threats to validity will be clear in the description of the planning conducted, and statements regarding sample sizes and generalization need not be explicitly stated or repeated in a discussion of validity threats at the end of each paper. As Feldt and Magazinius also conclude, the guidelines given in both Wohlin et al.

I believe a systematic approach to threats to validity is useful in research, however, there is a difference between having a checklist for a research program and including four categories of threats in the end of a paper. A checklist with as many categories as possible should be used, I argue, to better address biases.

Some troublesome studies about research evaluations show that researchers, just like all people, often conduct a biased memory search based on their motivation. However, people are not biased without the feeling of being able to justify their conclusions. There is also evidence showing that directional goals may affect the use of statistical heuristics. In a study by. All these studies show that we need a more systematic approach to validity on many different levels.

Generalizations need to be drawn with more care, I argue, and even the most extensive empirical studies in software engineering have a too small sample size to state anything about the truth for these concepts.

It takes a field decades to build up a body of knowledge extensive enough for a meta-study to have such claims of external validity. See for example Freeman et al. Validity of research is a thorny issue and of course depend on the research design, however, I believe a larger focus on construct validity is needed both in behavioral software engineering and, parts of what I suggest below, are also applicable to more general software engineering studies.

I suggest that no categories be used for listing validity threats in research papers since I believe they are somewhat counterproductive, despite the fact that I have most often done that myself. The more complex validity aspects that need clarification after the reader has read the method section could be discussed under a section called Validity threats , Threats to validity , Limitations , or the like.

Another option is to simply write the threats as a part of the discussion section, but the practical significance of the threats needs to be in focus. What is important is that researchers consider threats to validity throughout their research. To help to implement such an awareness, I suggest a checklist below inspired by seminal work conducted by Messick b in relation to testing validity, but where I also include validity aspects in relation to specific research studies as already suggested by Wohlin et al.

Messick a defines validity as follows:. These scores are a function not only of the items or stimulus conditions but also of the persons responding as well as the context of the assessment.

In particular, what needs to be valid is the meaning or interpretation of the score; as well as any implications for action that this meaning entails.

This definition implies that we always validate the usage of a test, and never the test itself. I would like to mention here that a test refers to a psychological test, i. In the quote by Messick, we can see that he advocates a more applied and practical treatment of validity. He also argues that validity of a test is only one construct that he calls construct validity.

He writes that different aspects of construct validity can still be presented in order of convenience. However, they are still interrelated both operationally and logically. The presented six aspects are Consequential, Content, Substantive, Structural, External, and Generalizability, and, to clarify, are all concerning the actual measurements, and not, for example, the generalizability of treatment in a specific experiment.

In addition, reliability is seen as a prerequisite for validity, and the external, internal and conclusion validity in relation to a research study is also included below, following Wohlin et al. However, I think it is of utter importance to change the culture in software engineering research from seeing tool-constructing as the holy grail of research and instead value validation studies higher, which then also includes validation of existing tools.

To build theory, and make good use of research funding, software engineering researchers need to conduct; ideally, a study of each aspect of construct validity presented below, before drawing conclusions to the intended population and connections between constructs. As already stated, my example is from behavioral software engineering research, but are partly applicable to closely related sub-fields of software engineering.

Throughout a research project or program, I suggest the following checklist be used:.



0コメント

  • 1000 / 1000