Differential item functioning software testing

Dif can occur when one group of individuals responds differently from another group on a given questionnaire item, even though both groups are equivalent. Dif analysis investigates a differential characteristic of a test item between subpopulations of examinees and is useful in detecting possibly biased items toward a particular subpopulation. Item response theory detects differential item functioning. Starting from a framework for classifying dif detection methods and from a comparative overview of the most traditional methods, an r. This chapter focuses on differential item functioning dif and differential test functioning dtf methods for dominance models, extending these approaches to ideal point models is an important area for future research. Item bias represents a threat to the validity of test scores in many different disciplines. Pdf an introduction to differential item functioning researchgate. Multiple ways to detect differential item functioning in sas yan zhang, educational testing service abstract differential item functioning dif, as an assessment tool, has been widely used in quantitative psychology, educational measurement, business management, and insurance and healthcare industries. Examining measurement invariance and differential item functioning with discrete latent construct indicators. This approach is useful because it gives us a better sense of test bias than inspection of individual items.

The irt dif graphing tool is an excel spreadsheet designed to help you visualize and evaluate the presence of differential item functioning dif based on item response theory irt. Multidimensional item response theory computer software. An item is considered to be biased if the item unfairly favors one group over another. Neither the list of the software nor the studies cited are meant to be. Most of our work has been with achievement tests and binaryscored items, but that. Differential item functioning detection with logistic regression. Sep 19, 2017 rasch analysis with a focus on differential item functioning dif is increasingly used for examination of psychometric properties of health outcome measures. An irt model is ideally suited for the detection of differential item functioning dif in examining the validity of a test or questionnaire 2.

The key to developing and maintaining a professionally sound and legally defensible test lies with the guidance of a knowledgeable and experienced psychometrics team. Differential item functioning sibtest assessing differential itembundle functioning difdbf sibtest implements a nonparametric estimation and hypothesis testing statistical method of assessing dif in one or more items andor dbf in one or more bundles of items. Three statistical tests for differential item functioning. We provide a tutorial on differential item functioning dif analysis. In addition, software to implement these methods easily is needed. We employ item response theory ability estimation in our models. Apr 12, 20 differential item functioning dif is when a test item favors or hinders a characteristic exhibited by group members of a testtaking population. The standardization approach to comprehensive differential item functioning cdif is described and contrasted with the log. Stepwise analysis of differential item functioning based. Differential item functioning dif analysis is a key component in the evaluation of the fairness and validity of educational tests. Differential testing, also known as differential fuzzing, is a popular software testing technique that attempts to detect bugs, by providing the same input to a series of similar applications or to different implementations of the same application, and observing differences in their execution. A differential item functioning analysis of the eq5d in cancer.

Measurement invariance and differential item functioning. Measuring differential item and test functioning across. Multilevel generalized mantelhaenszel for differential item. For each method, a number of the relevant software are also introduced. Computer adaptive multistage testing mscat fits the bill. Several methods have been proposed in recent decades for identifying items that function differently between two or more groups of examinees. The authors used irtlrdif software which relies on the lrt statistic to test for dif in. Model testing begins with examination of the statistical.

Examining measurement invariance and differential item. Part i provided an overview of dif as a concept, and a background for discussing some of the methods used to examine dif. In general, one of the groups is fixed as the reference. Bias is a judgment that may be made due to the presence of items in a test that show differential item performance by identifiable subgroups of examinees in the population. Psi has over 50 worldclass psychometricians and industrialorganizational psychologists that work with a wide range of testing programs and industries. Differential item functioning dif methods can be used to determine whether different subgroups respond differently to particular items within a healthrelated quality of life hrqol subscale, after allowing for overall subgroup differences in that scale.

Average item scores for subgroups having the same overall score on the test are compared to determine whether the item is measuring in essentially the. Recommendations for conducting differential item functioning dif analyses for students with disabilities based on previous dif studies heather buzick and elizabeth stone. This paper addresses the nature of dif, methods that can be used to assess the presence of dif, and how to evaluate dif once it has been detected. With the rising concerns over the fairness of language tests, differential. The textbook is focused on item response theory overall, but discusses differential item functioning and item bias. Independent models can be estimated in parallel by defining a parallel object with mirtcluster, which will help to decrease the runtime.

The importance of mi is even greater when test respondents are compared based on their responses to different items, such as the case in computerized adaptive testing cat, because the existence of items that exhibit differential item functioning dif can produce bias within a group as well as between groups. Bock, muraki, and pfeiffenberger 1988 proposed a dichotomous item response theory irt model for the detection of differential item functioning dif, and they estimated the irt parameters and the means and standard deviations of the multiple latent trait distributions. Psychometric tools difpack source and executable code by. Differential item functioning dif is when a test item favors or hinders a characteristic exhibited by group members of a testtaking population. Average item scores for subgroups having the same overall score on the test. The differential attainment of candidates in the various groups does however remain, and remains unexplained, but cannot be attributed to differential item functioning in relation to sex or ethnicity of particular questions in this multiple choice assessment across the range of diets analysed. In more technical terms, differential item functioning dif which. For each of the six combinations of sample sizes by ability matching conditions, 100 replications of dif detection comparisons were simulated. The results indicated that the irtlr test outperformed the mimic and logistic. Pdf differential item functioning dif has been increasingly applied in. The standardization approach to assessing comprehensive. A line of commonality is equally good at predicting x values from y values, and y values from x values, unlike a regression line which is only good at predicting one way. It is meant as a posthoc tool for analysis that is done with other software, such as dfit. Detecting multidimensional differential item functioning.

To date, however, mstgen is the only software program that completely fills this role han 20. Detection of differential item functioning under the graded. More generally, the results supported the use of irtbased matching variables in dif analysis. The e ducational testing service uses the mh statistics in dif. Three statistical testing procedures wellknown in the maximum likelihood approach are the wald, likelihood ratio lr, and score tests. Testing differential item functioning in small samples. Pdf an introduction to differential item functioning. It highlights methods for testing test items that function differently for different groups. The purpose of the paper is to present and summarise recent advances of.

To demonstrate the value of item response theory irt and differential item functioning dif methods in examining a healthrelated quality of life hrqol measure in children and adolescents. Differential item functioning dif analyses of health. A differential item functioning analysis of the eq5d in. Wainer, modelbased standardized measurement of an item s differential impact. Differential functioning at the test level aggregated dif impact is the sum of differential functioning at the item level and indicates how much each item s compensatory cdif contributes to dtf of the whole. Dif is a measure of how much harder or easier an item is for a respondent of a given group as compared to respondents from other groups of equal ability. Recommendations for conducting differential item functioning. Differential item functioning for minority examinees on the sateducational testing service, may 1988. A rasch analysis was undertaken using the winsteps version 3. In brief, differential item functioning dif occurs when groups such as. To take account of dif in order to retain precision of measurement, split of difitems into separate sample specific items has become a frequently used technique. Using differential item functioning to evaluate potential. Differential item functioning of gre mathematics items across computerized and paperandpencil testing media the journal of technology, learning, and assessment volume 5, number 4 december 2006 a publication of the technology and assessment study collaborative caroline a. Good practices for identifying differential item functioning jstor.

Differential item functioning columbia university mailman. Most of our work has been with achievement tests and binary scored items, but that. This function runs the wald and likelihoodratio approaches for testing differential item functioning dif. Differential item functioning for minority examinees on the. An overview of differential item functioning in multistage computer. The logistic regression is one of the generalized linear models in which statistical testing is based on maximum likelihood ml estimation. Differential item functioning columbia university mailman school. Differential item functioning dif has been increasingly applied in fairness studies in psychometric circles. Detecting differential item functioning using logistic. Although wellknown, the application of these three testing procedures in the logistic regression method to investigate differential item function dif has not been rigorously made yet.

Differential item functioning dif is a direct threat to the mi of test items. The logistic regression procedure was run with r software. Dif analyses are statistical procedures used to determine to what extent the content of an item affects the item endorsement of subgroups of testtakers. If dif is found for many items on the test, the final test scores do not represent the same. Differential item functioning of gre mathematics items across. Sep 11, 2019 in international comparisons, one of the threats against validity is country differential item functioning dif, also called item bycountry interaction. These additional analyses provided further evidence that the catbased dif procedures performed well. Paper 29002015 multiple ways to detect differential item. An introduction to differential item functioning analysis eric. Itemresponsetheorybased approaches which include 1, 2, and 3. In brief, differential item functioning dif occurs when groups such as defined by gender, ethnicity, age, or education have different probabilities of endorsing a given item on a multiitem scale after controlling for overall scale scores.

Differential item functioning dif is typically evaluated in educational and. Judicious application of this methodology by the researchers, however, requires an understanding of the technical complexities involved. Modifying measures based on differential item functioning. Recent advances in analysis of differential item functioning. Differential item functioning dif occurs when an item on a test or questionnaire has different measurement properties for 1 group of people versus another, irrespective of mean differences on.

A simulation study of methods for assessing differential item. Differential item functioning sage research methods. Item parameter estimates and likelihood values were obtained by marginal maximum likelihood estimation using the computer program multilog. Three nested ordinal logistic regression models are applied to each item.

If the factor bringing about such a difference is not part of the construct of focus in the test, then the test would be biased. Friesen, associate psychometrician at paragon testing enterprises. Through simulation studies, it is shown that the logistic regression procedure is more powerful than the mantelhaenszel procedure for detecting nonuniform dif and as powerful in detecting uniform dif. Thayer, a monte carlo study of factors that affect the mantelhaenszel and standardization measures of differential item functioning. A general framework and an r package for the detection of. Differential item functioning analysis with ordinal logistic. Modeling differential item functioning with groupspecific. This article provides a brief primer overview of differential item functioning dif analysis. Differential item functioning dif is a statistical characteristic of an item that shows the extent to which the item might be measuring different abilities for members of separate subgroups. With the rising concerns over the fairness of language tests, differential item functioning dif has been increasingly applied in bias analysis. More specifically, an item is considered to be biased if two conditions are met. A note on a multiple testing procedure tenko raykov, dimiter m. Ramsay, comments on the monte carlo study of donoghue, holland, and thayer.

886 140 832 530 1481 426 1199 1379 999 1680 413 414 58 241 431 674 735 628 1301 1423 1401 349 500 945 1374 109 428 961 816 547 1413