These days, students usually log onto various freshers' portals to check out question papers which are used for entry-level recruitment by companies. It is obvious then, that by some means or the other, the questions/tests used in Campus Events find their way into these portals.
Needless to say, Security has been an issue for any assessment process. The primary target, it seems, is always the question papers used. Test users using assessments in their recruitment process is concerned about the security and would want assessments to be unique and different from others to ensure a fair recruitment process. This results in a great demand for creating different forms for a Test. So, to ensure security, Test creators need to build different forms of the tests. For example: if a Test creator has created a test on Electronics, having components A, B and C, he/she is expected to create different tests using the same components. This would mean that the test takers are not aware of the content of the tests before they take them.
In MeritTrac, the demand for unique sets exists because the Clients need unique sets for different events so that strict confidentiality is maintained, and questions do not get leaked. This is where Parallel Forms plays a major role.
Definition - Parallel forms:
Two or more forms of a test are considered parallel when they have been developed to be as similar to one another as possible in terms of the test specifications and statistical criteria.
Importance of Equivalence in Parallel forms:
If the parallel forms are not equal in terms of content or statistical criteria (p-value and discrimination index), it would impact the scores and hence any decision based on these scores will be inaccurate. Hence, we should take proper care before we release parallel forms to ensure we are fair to the candidates taking the test.
Let me explain this:
Suppose, we have a test of 5 items (questions) and we have created 2 forms (sets/papers) using 5 items each, for each paper. If the items are designed to check Prepositions, then, all these items should check only for Prepositions and nothing else. That is, the test taker does need to apply knowledge of any other “parts of speech” to answer these questions on Prepositions. All he/she needs to know is Prepositions. Each item will have to be created carefully so that it retains its validity measure. Otherwise, it will affect the test takers.
Similarly, if the DL (Difficulty Level) for a given question is not set after statistically verifying it, but set at random, based on the Item creator’s assumptions, the performance of the test takers will vary. For example, a typical assumption while setting a DL could be : Out of 100 test-takers who attempt this question, 40 would answer it correctly. But in reality, it could actually be 60. If a Statistical process were stringently followed to derive the DL, it would involve exposing this item to a similar group of test takers. That is, expose this item only to College Freshers or Developers with 2 years of experience, but never to a mixed audience, which comprises of both. However, while conducting such sampling exercise, the candidate should not be marked/scored on this item. That is answering this item should not affect his scores in any way. Considering our shortlist criteria to be fixed, we cannot have discrepancies in our items in terms of content and DLs, because it would impact the throughput and also the test takers.
Equivalence of test score means, variances, and errors are interpreted as evidence that tests are parallel
Considering the business demands, we might not be able to check the P and DI value before we use items in our tests. But we should create a process to monitor each item’s performance and tag the question accordingly in terms of statistical criteria. Creating a good Question Bank having all the necessary statistical criteria will enable us to create parallel forms.
Key points to remember while building Parallel forms:
- When items are created for a particular content area, multiple checks have to be conducted to ensure that the items belong to the field desired. That is, while building content for Pointers in C, we should not have questions related to Arrays.
- P Values (Facility values) & Discrimination Index value (The index of discrimination is a useful measure of item quality whenever the purpose of a test is to produce a spread of scores, reflecting differences in test takers’ achievement, so that distinctions may be made among the performances of test takers) for each item should be computed.
- Once we decide on the Blueprint, we need to select items, which satisfy content and cognitive specifications (problem solving, application, knowledge) & statistical criteria (P value & DI value).
No comments:
Post a Comment