深度学习+多目标追踪经典文章.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：1.72M 资料格式：pdf 举报版权申诉

weixin_40628041-10047109-4744302542907096137.pdf-第1页.png

第1页 / 共14页

weixin_40628041-10047109-4744302542907096137.pdf-第2页.png

第2页 / 共14页

weixin_40628041-10047109-4744302542907096137.pdf-第3页.png

第3页 / 共14页

weixin_40628041-10047109-4744302542907096137.pdf-第4页.png

第4页 / 共14页

weixin_40628041-10047109-4744302542907096137.pdf-第5页.png

第5页 / 共14页

weixin_40628041-10047109-4744302542907096137.pdf-第6页.png

第6页 / 共14页

weixin_40628041-10047109-4744302542907096137.pdf-第7页.png

第7页 / 共14页

weixin_40628041-10047109-4744302542907096137.pdf-第8页.png

第8页 / 共14页

文本预览

Multiple Object Tracking and Attentional Processing CHRISTOPHER R. SEARS, University of Calgary ZENON W. PYLYSHYN, Rutgers University Abstract How are attentional priorities set when multiple stimuli compete for access to the limited-capacity visual attention system? According to Pylyshyn (1989) and Yantis and Johnson (1990), a small number of visual objects can be preattentively indexed or tagged and thereby accessed more rapidly by a subsequent attentional process (e.g., the tradi- tional "spotlight of attention"). In the present study, we used the multiple object tracking methodology of Pylyshyn and Storm (1988) to investigate the relation between what we call "visual indexing" and attentional processing. Partici- pants visually tracked a subset of a set of identical, independ- ently randomly moving objects in a display (the targets), and made a speeded identification response when they noticed a target or a nontarget (distractor) object undergo a subtle form transformation. We found that target form changes were identified more rapidly than nontarget form changes, and that the speed of responding to target form changes was unaffected by the number of nontargets in the display when the form-changing targets were successfully tracked. We also found that this enhanced processing only applied to the targets themselves and not to nearby nontarget distractors, showing that the allocation of a broadened region of visual attention (as in the zoom-lens model of attentional alloca- tion) could not account for these findings. These results confirm that visual indexing bestows a processing priority to a number of objects in the visual field. It is widely accepted that visual attention can be shifted from one location to another independently of eye movements, and that the processing of stimuli appearing at attended locations is enhanced. The methodological paradigm that produced much of the evidence for this is the attentional cueing procedure (Posner, Snyder, & Davidson, 1980). In a cueing experiment, visual attention is shifted to a predeter- mined location either endogenously, in which the shift is under the volition of the observer, or exogenously, in which the shift is involuntarily elicited by a highly salient cue. A substantial number of studies have demonstrated that the speed and accuracy of processing at cued locations is supe- rior to that at uncued locations. For example, Downing (1988) found that perceptual sensitivity was enhanced at the location of a cue, and concluded that focused attention serves to facilitate visual information processing. that this enhanced processing Previous research suggests that, whether controlled endogenously or exogenously, there can be only one focus of attention at any one time. Posner, Snyder, and Davidson (1980) provided evidence that visual attention is allocated to single contiguous regions of the visual field, enhancing the processing of stimuli falling within the single contiguous "spotlight." Eriksen and St. James (1986) subsequently showed falls off monotonically as one moves out from the locus of visual attention, and that the resolution of the spotlight varies inversely with the size of the region encompassed (the "zoom-lens" model). Many investigators have concluded that the spotlight is the primary processing bottleneck of the attentional system, as only stimuli falling within this region undergo extensive perceptual analysis (e.g., Eriksen & Hoffman, 1974; Yantis & Johnston, 1990), and only one such region can be attended to at any one time (Eriksen & Yeh, 1985). There is also evidence that focal attention may be directed at objects in the field of view rather than at spatial regions (Baylis & Driver, 1993) and that the objects selected in this way continue to be identified as the same objects as they move about (Kahneman & Treisman, 1992; Pylyshyn, 1989). Although the evidence for the unitary nature of focal attention is somewhat contentious (e.g., Castiello & Umilta, 1990; Joula, Bouwhuis, Cooper, & Warner, 1991; Pashler, 1998), there do seem to be severe limitations on the alloca- tion of focal attention. Thus, when multiple stimuli compete for attention, there must be some mechanism that priorizes the stimuli in some way so that stimulus selection can proceed in an efficient manner. Consequently, an under- standing of the means by which attentional priorities are assigned is of some importance. Yantis and Johnson (1990) sought to determine how attentional priorities are set under conditions in which attention is allocated in a stimulus-driven manner. They had Canadian Journal of Experimental Psychology, 2000, 54:1, 1-14

participants search for a target letter in static multielement displays composed of multiple abrupt onset and no-onset items. Abrupt onset and no-onset items are distinguished by their attentional saliency: No-onset items are presented by removing camouflaging line segments from an existing object in the display until the target is revealed, whereas onset items abruptly appear in previously empty locations of the display. In previous studies, Yantis and his colleagues (Jonides & Yantis, 1988; Yantis & Jones, 1991; Yantis & Jonides, 1984) found that single abrupt onsets automatically capture attention, and that abrupt onset items are processed before other no-onset items in a display. Yantis and Johnson (1990) found that when multiple abrupt onsets are present, a limited number of them (approximately four) can be processed before other no-onset items. They proposed that attentional priorities are set by means of attentional "tags" that are bound to the representations of highly salient objects (such as abrupt onsets), and that a limited number of abrupt onsets can be attentionally tagged and then given priority access to focused attention. Like Yantis' priority tag model, Pylyshyn's (1989) FINST model of visual indexing provides a means for setting attentional priorities among multiple stimuli. In the FINST model, a limited number of objects can be preattentively and simultaneously indexed independently of their retinal locations or identities (FINST is an acronym for Fingers of iNSTantiation, a reference to the idea that these indexes point to objects and provide a way to bind objects to internal symbolic arguments). Pylyshyn (1989) suggested that a primary function of visual indexing is to individuate a small number of objects so that they may be directly accessed and subjected to focused attentional processing. Indexing provides direct access to the objects, so that once an object is indexed it is not necessary to use attentional scanning to find that object (Pylyshyn et al., 1994). In contrast, in order to selectively attend to a nonindexed object, its position must first be ascertained through attentional scanning (e.g., Treisman & Gelade, 1980; Tsal & Lavie, 1993). Visual indexing thus provides a means of setting attentional priorities when multiple stimuli compete for attention, as indexed objects can be accessed and at- tended before other objects in the visual field. In the FINST model, visual indexes are assigned primarily in a stimulus-driven manner, so that salient feature charac- teristics or changes are automatically indexed. Typical stimulus events that would be indexed roughly correspond to stimuli that automatically attract focal attention, such as peripheral cues (Posner & Cohen, 1984), abrupt visual events such as onset stimuli (Yantis & Jonides, 1984), and "pop-out" visual features (e.g., a red circle embedded in a display of black circles; Treisman & Gelade, 1980). Burkell and Pylyshyn (1997) have provided more direct evidence of the ability of several simultaneous onsets to attract indexes which could then be used to access and Sears and Pylyshyn examine the indexed items. They used Treisman-type visual search tasks to study the selection of search subsets by onset cues, and they appealed to the fact that visual search behav- iour depends on the nature of the set through which one has to search. In the original visual search studies, Treisman and Gelade (1980) showed that search targets that differed from each nontarget in the search set by a single feature (called the "single-feature" search condition) were easily recognized and the time it took to recognize them was very nearly inde- pendent of the number of nontargets in the search set. In contrast, displays in which each nontarget shared some feature with the target, so that only a combination of two features defined the target (called the "conjunction-feature" search condition), resulted in generally slower searches as well as reaction times that increased substantially with increasing numbers of nontargets in the display. Burkell and Pylyshyn used displays which were of the "conjunction search" type. However, they selected subsets through which participants were required to search, and the subsets could constitute either a single-feature or a conjunction-feature search. The subsets were selected as follows. All members of the search set were precued by "X" place holders, but a subset of variable size that was to contain the target was cued by placeholders that occurred about one second later than the other placeholders, and about LOO ms before the search began. The precued subset itself could constitute either a single-feature or a conjunction-feature search task (i.e., the target could be distinguished from members of the subset by one feature or only by a conjunction of two features). Burkell and Pylyshyn found that a subset precued by these late onset cues could, for the purposes of speeded visual search, be separated from the rest of the display and treated as though they were the only items present. Burkell and Pylyshyn showed, for example, that only the single-feature versus conjunction-feature property of the subset was relevant to the pattern of search reaction times (the overall display always provided a conjunction-search condition because it contained items with all combinations of proper- ties). Moreover, they found that increasing the relative distance between search targets did not increase the search time, showing that the indexed targets could be accessed without having to find them first by scanning the display. Visual indexes point to features or objects, not to the locations that these stimuli occupy. Like the object files of Kahneman, Treisman, and Gibbs (1992), visual indexes are object-centred and continue to reference objects despite changes in their location. According to the visual indexing model, a visual index automatically individuates and tracks moving objects. Because there are a small number (around four or five) of such indexes, observers can track around four or five independently moving distinct visual objects in parallel. In an empirical test of this hypothesis, Pylyshyn and Storm (1988) had participants visually track a

Multiple Object Tracking prespecified subset of a larger number of identical, randomly moving objects in a display. The members of the subset to be tracked (the targets) were identified by briefly flashing them several times, prior to the onset of movement. Accord- ing to the model, targets designated in this fashion are automatically indexed. During the tracking task, the targets were indistinguishable from the other distractor objects, which made the historical continuity of each target's motion the only clue to its identity. Participants tracked the target objects for 5 to 10 seconds, after which either a target or a distractor was probed by superimposing a bright square over it. The participants' task was to determine whether the probed object was a target or a distractor. According to Pylyshyn and Storm (1988), the indexing of the target objects would allow each of them to be simultaneously tracked and identified throughout the motion phase of the experiment, despite the fact that the targets were perceptu- ally indistinguishable from the distractors. Pylyshyn and Storm (1988) found that performance in this multiple object tracking task was extremely high for subsets of up to five elements — in fact, participants could simultaneously track up to five target objects at an accuracy approaching 90%. McKeever and Pylyshyn (1993), Yantis (1992), Scholl and Pylyshyn (1999), Viswanathan and Mingolla (1989), Cavanagh (1999), and Culham et al. (1998) have all reported similar results. Moreover, using a simula- tion of the task, analyses by Pylyshyn and Storm (1988) and McKeever and Pylyshyn (1993) indicate that a single spotlight of focused attention moving rapidly among the target objects and updating a record of their locations could not produce this level of tracking performance in the setup used in their study. In the Pylyshyn and Storm (1989) simulations, for example, tracking performance was no higher than 50% based on extremely conservative assump- tions. One such assumption was an attentional scan velocity as high as 250 degrees per second (the highest estimated scan velocity in the previous literature). Another assumption was that participants stored (with zero encoding time) the predicted locations based on the direction and speed of the targets' motion, and that they used a guessing strategy when they were uncertain. These results suggest that it is not possible for the task to be performed at the observed level of accuracy without some parallel tracking of the target objects. Pylyshyn and Storm (1988) concluded that their results confirmed a key prediction of the visual indexing model, namely, that a small number of objects can be indexed and that the indexes are used to keep track of them in parallel without attentional scanning and without encoding their locations. We should note that Yantis (1992) has added an addi- tional mechanism to explain how multiple target objects are tracked in this task. Yantis (1992) argued that participants spontaneously group the targets together to form a virtual polygon, whose vertices correspond to the continually changing positions of the targets, and that it is this single "object" that is tracked throughout the trial. While it may be that observers conceptually group elements into a polygon, it is still the case that the individual targets them- selves must be tracked in order to keep track of the location of the vertices of such a virtual polygon. Consequently, we do not consider Yantis's (1992) account to be incompatible with Pylyshyn and Storm's (1988) analysis. Indeed, we have offered a closely related proposal for what we call an "error recovery" stage of the tracking process, wherein a polygon- like representation of the relative location of targets is maintained and referred to when the loss of a target is detected (McKeever & Pylyshyn, 1993; Pylyshyn et al., 1994). The purpose of the present investigation was to explore the relation between visual indexing and attentional process- ing in the context of the multiple-object tracking paradigm. According to Pylyshyn (1989), visual indexing provides a means of keeping track of a number of objects, in the sense of providing a means for querying them, without first having to ascertain their positions through attentional scanning. We assume that, in order to focus unitary atten- tion on an object, one must first find the object and move focal attention to it. However, if the object has already been indexed, focal attention can be allocated to it directly, without prior search. Consequently, shifting attention to indexed objects should be faster than shifting attention to nonindexed objects. Because of this we expect that in a task requiring focal attention, a response to an indexed object will generally occur before a response to other objects in the visual field. This hypothesis was investigated in Experiment 1. Participants tracked a set of target objects and made a speeded identification response when they detected that a target or a distractor object underwent a subtle form transformation. According to our hypothesis, targets are indexed during the tracking task, and therefore unitary focal attention can be moved directly to them. If focal attention is shifted to a target, either on some regular basis or when some global change-detector reports a change, then any change at that target is more readily recognized. Thus, if targets undergoing changes are found to be more rapidly identified, this would support our hypothesis that visual indexing facilitates attentional processing. Experiment 1 METHOD Participants. Seventeen individuals participated in this experiment. All had normal or corrected-to-normal vision, and were paid $15 for their participation. Apparatus and stimuli. Stimuli were presented on a 19-inch Sun MicroSystems monochrome monitor with a resolution of 1,056 by 900 pixels. A Sun MicroSystems 3/50 microcom-

puter controlled the stimulus presentation and randomiza- tion of trials. Response times were collected with a three- button mouse and a dedicated Zytec timing board (Danzig, 1988) which provided response latencies accurate to one millisecond. Stimuli consisted of rectangular, seven-segment box figure eights (B), and the capital letters E and H. The E and H characters were created by removing lines from the figure eights. The line segments used to construct these stimuli were a single pixel in width. The E and H figures were used because of their high degree of similarity to the figure eight character. All the stimuli were created as pixel-drawn "objects" in video memory that could be moved across the screen without being continually recreated. Each object subtended a visual angle of 0.84° in height and 0.63° in width from a viewing distance of 50 cm. Stimuli were white and were presented on a dark background. The animation of the objects in the display was con- trolled by an algorithm which simulated brownian motion, creating a random, independent, and continuous pattern of movement for each object. Motion sequences were generated in real-time during each trial and all the participants re- ported the perception of smooth and continuous motion during the practice trials. Each object was surrounded by an invisible circular barrier which insured that no two objects could collide or superimpose themselves over one another. Each object moved at a randomly determined velocity of between 4° and 8° per second. Once determined, the velocity of each object did not vary throughout a trial. A wall repulsion force retained all the objects within a 15° by 15° area by bouncing them off these invisible borders. Although the total number of target and distractor objects displayed varied according to the trial type, the movement and velocity of these objects did not vary as a function of the number of targets and distractors present in a given trial. This was accomplished by having the maximum number of objects (16) moving in the display on every trial, but making only a subset of them visible for the particular condition concerned. This technique eliminated any relation between the density of objects in the display and the freedom of their movement. Procedure. Participants were seated in a darkened room approximately 50 cm from the display and used a chinrest to reduce head movements and control viewing distance. A three-button mouse was used to collect responses. Partici- pants were given written instructions prior to the experi- ment, which outlined the general procedure and emphasized the importance of maintaining fixation throughout the session. Each participant was then given a demonstration and explanation of a trial sequence. They were instructed to note the positions of the blinking target objects at the start of each trial, because the task was to keep track of these Sears and Pylyshyn objects when they began moving. During the motion phase, they were instructed to track the target objects without moving their gaze from the fixation cross (eye position was not monitored). At some point in the trial, one of the target or distractor objects would transform into an E or an H, and participants were to respond by pressing the E or H button as quickly as possible when they had identified this form change. Participants were informed that the target and distractor objects were equally likely to undergo form changes. Each participant completed 20 practice trials prior to the experiment. Figure 1 depicts a trial sequence. Each trial began with the presentation of an isolated fixation cross for 2 s. Then, depending upon the condition, from 7 to 16 figure-eight objects appeared on the screen. The placement of the objects was randomly determined on each trial, subject to the constraint that none could overlap one another or their invisible "barriers." Three or four of these objects then began to blink on and off for 3 s, designating them as the target objects to be tracked. The appearance of the remain- ing objects (the distractors) did not change during this target designation phase. All of the objects then began to move in a random and continuous fashion about the screen (subject to the previously mentioned constraints), and the partici- pant attempted to simultaneously track each of the target objects. The target and distractor objects were indistinguish- able from one another during this tracking phase. Partici- pants tracked the targets for a randomly determined interval of between 5 and 9 s, after which either a target or a distractor object was transformed into an E or an H. On 50% of the trials, a target object underwent a form change, and on the remaining 50% of the trials a distractor underwent a form change. The remaining targets and distractors, as well as the new letter in the display (E or H), maintained their movement during the transformation and continued moving until the participant responded. After a response had been made, the screen was cleared and a new trial was initiated following a 3-s inter-trial interval. Each participant completed eight blocks of 45 trials. The order of trials was randomized separately for each partici- pant, and there was a five-minute rest period between each block. Design. There were three factors manipulated in this experi- ment: the type of form change (target or distractor form change), the number of targets tracked (3 or 4) and the number of distractors present in the display (4, 8, or 12). There were 30 trials in each of the 12 conditions, for a total of 360 trials. RESULTS Response latencies greater than three seconds were classified as outliers and were removed from the dataset. This proce- dure resulted in the removal of 3.6% of the data. Response

Multiple Object Tracking Figure 1. A schematic representation of a trial sequence. Participants view items on a video monitor. In the target designation phase (t - 2) the target objects were flashed for 3 seconds (the selected targets are shown in circles in this illustration). The targets were then tracked for several seconds during the motion phase (t — 3), and then a target or a distractor object underwent a form change by dropping two segments and ending up as an E or an H (t - 4). The participants' task was to identify this form change as quickly and as accurately as possible. In this example, there are three target objects, and one of these target objects undergoes a form change to an E shape. latencies and error rates were submitted to a 2 (Type of Form Change) x 2 (Number of Targets) x 3 (Number of Distractors) repeated measures analysis of variance (ANOVA). The mean identification error rate (E or H) across all the conditions was 3.8%, and a three-factor repeated measures ANOVA yielded no significant effects, all Fs < 1. Unless otherwise stated, the p values for all significant statistics reported in the text are less than .05. In the analysis of the response latencies, the main effect for the number of targets tracked (3 or 4) was not signifi- cant, F < 1. That is, response latencies were similar whether three or four target objects were being tracked. The mean response latencies in Experiment 1, collapsed across the number of targets tracked (3 or 4), are listed in Table 1. There was a significant main effect for the type of form change (target or distractor object), which confirmed that participants responded more rapidly to target form changes than to distractor form changes, f(l, 16) = 29.58, MSB = 25,084. Responses to target form changes were an average of 121 ms faster than responses to distractor form changes. The main effect for the number of distractors (4, 8, or 12) was significant, f(2, 32) = 17.04, MSE = 9,009, as response latencies increased as the number of distractors in the display increased. The interaction between the type of form change and the number of distractors was not significant, F(2, 32) = 1.91, MSE = 7,772. The absence of this interaction suggests two things: (a) that participants responded to target form changes more rapidly than to distractor form changes regardless of the number of distractors in the display, and (b) that response latencies to both target and distractor form changes increased with increases in the number of distractors. This latter finding was unexpected, and will be discussed in greater detail in the Discussion section. The interaction between the number of targets and the number of distractors was not significant, F < 1, nor was the interaction between the type of form change and the number of targets, F(2,16) = 3.40, MSE = 7,692. Finally, the three-way interaction was not significant, / < 1. Analysis of region-bounded effects. The results of this experi- ment clearly indicate that target form changes are identified more rapidly than distractor form changes when partici- pants are tracking the target objects. This result supports our hypothesis that visual indexing facilitates attentional processing. Although we contend that the latency advantage for target form changes is a consequence of the visual indexing of the target objects, there is an alternative explanation for this finding. Specifically, it is possible that during the tracking task the participant's attention was focused on the continually changing triangular or polygonal figure whose vertices constituted the three or four targets in the display. Maximum attentional sensitivity may then have been allocated to the region bounded by and including the group of target objects. Such a process could be the consequence of a deliberate attentional strategy employed by participants to enable target tracking, in which a "zoom-lens" of visual attention was focused on this region (Eriksen & St. James, 1986). If this was the case, then we would expect that responses to target form changes would be facilitated, because all of the target form changes would have occurred within this region of focused attention. Moreover, distractors undergoing form changes within this region would be identified more rapidly than distractors undergo- ing form changes outside of this region, and the averaging of these two types of trials would produce an apparent delay in responding to distractor form changes relative to target form

TABLE 1 Mean Response Latencies (in Milliseconds) and Identification Error Rates (in %) to Target and Distractor Form Changes in Experiment 1 Type of Form Change Number of Distractors Target M SE Errors Distractor M SE Errors 1022 43.4 3.8 1175 51.6 3.8 1080 34.1 4.6 1194 50.9 3.4 12 1145 36.3 3.1 1241 46.3 4.5 Note. The data are averaged across the number of targets (3 or 4). changes. Thus, a zoom-lens model could account for our data by postulating that participants dynamically allocate attention to the region encompassed by the targets. To evaluate this hypothesis, for the last 6 of the 17 participants in Experiment 1, the screen coordinates of the target and distractor objects were recorded following every response. The trials in which a distractor object underwent a form change were divided into two groups: trials in which the participant responded to the form change when the distractor was within the region encompassed by the targets, and trials where the participant responded to the form change when the distractor was outside of this region. The target locations at the time of the response served as the vertices of the region in question, with the boundaries of the entire region defined by the convex hull of the target objects (the smallest convex polygon that contained all of the targets). These data were then submitted to a one-factor repeated measures ANOVA, with distractor position (distractor inside or outside the convex hull of the targets) as the single factor. A zoom-lens account would predict an effect of distractor position in this analysis, with distractor form changes being identified more rapidly when they occurred within the convex hull of the target set. However, in this analysis the effect of distractor position was not significant, F < 1. That is, response latencies to distractor form changes were similar regardless of whether the distractor was inside (1,215 ms) or outside (1,238 ms) the convex hull of the target set.1 Consequently, it does not appear that the facilitation in responding to target form changes can be explained by appealing to a zoom-lens Because the data from only 6 of the 17 participants in this experiment were available for this analysis, one could argue that the lack of an effect of distractor position (inside or outside the convex hull of the target set) was due to a reduction in statistical power. However, the data from the same six participants indicated that target form changes were responded to more rapidly than distractor form changes, f(\, 5) - 14.19, MSE - 22,103. Moreover, there was no effect of distractor position in Experiment 2 either, where the data from 24 participants were examined. Sears and Pylyshyn mechanism. Note that a similar result was also reported by Intriligator and Cavanagh (1992), who used a variant of the multiple object tracking task involving only two targets moving in a rigid configuration. They reported that the detection of simple luminance changes was facilitated when they occurred on targets being tracked, but not in the region between the two tracked objects. DISCUSSION Why are target form changes identified more rapidly than distractor form changes during the tracking task? According to our hypothesis, the indexes bound to target objects confer an access priority to these objects, so that target objects can be checked via focal attention before distractor objects when a form change occurs. Distractor form changes will only be identified by way of a serial self-terminating attentional scan initiated after the indexed objects have been checked. Our explanation is similar to that of Yantis and Johnson (1990), who argued that a small number of abrupt onset items can be attentionally tagged and then examined before no-onset items in visual search displays, producing a processing advantage for abrupt onset items. Similarly, our claim is that the target objects are indexed at the start of each trial, and that this indexing allows them to be continually referenced when they are in motion. When a form change occurs, the indexes bound to the target objects allow them to be directly examined via focused attention, so that target objects are checked before distractor objects for the presence of a form change. Note that we are assuming that the form change itself can be detected without the aid of focused attention. More specifically, we contend that the initial registration of the form change event is detected preattentively, perhaps by a generalized difference operator which signals that some general change in the display has occurred but does not provide precise location or identity information (e.g., Atkinson & Braddick, 1989; Scialfa & Joffe, 1995). The detection of the form change would then permit the alloca- tion of focal attentional required to identify the type of form change (E or H). If the target objects are being checked before the distractor objects when a form change occurs, then target form changes will be identified more rapidly than distractor form changes. This would seem to be a plausible account of the data at hand, and yet this explanation is not consonant with the fact that responses to target form changes were increasingly delayed as the number of distractors in the display increased. That is, if the target objects are always checked before the distractors, then increasing the number of distractor objects in the display should have no effect on the speed at which target form changes are identified. This application of the visual indexing model assumes, however, that the target objects are flawlessly indexed and tracked throughout every trial. This assumption is almost certainly false, as the tracking task itself is subject to error,

Multiple Object Tracking and failures of tracking do occur. It is probable that indexes do not always remain permanently bound to the target objects throughout the tracking task. Although we maintain that indexing and tracking a small number of objects is preattentive, maintaining those indexes over an extended period of time requires some effort or reactivation to prevent their decay or loss. A reasonable assumption is that the probability of an index being lost or misplaced would increase with the number and/or density of distractors in the display, so that increases in the number of distractors would lead to decreases in tracking performance. It is known that the attentional resolution required for individu- ating objects drops off rapidly with eccentricity (Intriligator, 1998), increasing the chances that when a target and a distractor object pass close to one another in the periphery of the display, the two objects may then momentarily fail to be resolved. When the target and distractor subsequently move away from each other, the index could be shifted to the wrong object or lost altogether. If an index was shifted to a distractor, the participant might then consider this object a target, while the object that had been tracked might then become a distractor. If the index bound to a target object was lost or shifted, and the same target subsequently underwent a form change, then responses to this form change would be delayed. In effect, this target would now be a distractor, and responses to these nonindexed target form changes would be slower than responses to indexed target form changes. During the course of the experiment, a fair number of these events may have occurred, increasing the average response latencies to target form changes. More specifically, as the number of distractors in the display increased, the probabil- ity that an index would be lost or shifted to a distractor would also have increased. Averaging the response latencies to form-changing targets that were tracked with those that were not tracked would create a spurious increase in response latencies to target form changes with increases in the number of distractors. In Experiment 2 we tested this possibility. Experiment 2 If imperfections in the process of indexing and tracking the target objects was responsible for the display size effect witnessed in Experiment 1, then such an effect should not occur if the targets are perfectly tracked throughout every trial. Although it is not possible to ensure that all the targets are perfectly tracked, we have designed a procedure that may, given certain assumptions, help us to determine whether a particular target that underwent a form change had in fact been correctly tracked on that particular trial. To do this, we developed a dual-response procedure in which we asked participants to indicate, on each trial, whether the object that underwent a form change was a target or a distractor. Thus, the participant was required to indicate the identity of the changed object and also whether it was a target or a distractor. In this experiment, participants tracked four target objects and made a speeded identification response to a form change in the display (E or H), after which they made a two- alternative forced choice categorization decision as to what type of object underwent a form change (target or distractor). Because participants had to indicate what type of object underwent a form change on each trial, the trials where participants had lost a form-changing target could be distinguished from the trials where a form-changing target was accurately tracked. If failures in tracking performance produced the display size effect for target form changes in Experiment 1, then excluding from the RT analyses those trials where participants had lost a form-changing target should greatly attenuate and perhaps even eliminate this display size effect. Of course, the success of this method depends on partici- pants being able to categorize the changed objects as ones they had or had not tracked. Because there will be some trials on which participants may not be certain as to whether the object that underwent a form change was a tracked object, their category assignment may contain both errors of omission and errors of commission. Such errors may occur for several reasons. For example, a target may be lost because the index simply becomes dissociated from it — in which case the forced-choice categorization response may merely reflect a guessing strategy. On the other hand, if the target is lost because the index shifts to a nearby distractor, the participant may unknowingly take the newly indexed object to be a target or might even be aware of the shift and exercise some wariness. Clearly, our ability to isolate those trials in which the participant correctly tracked a form- changing target will not be without some error. But so long as the errors are relatively infrequent and the bias in the use of the target versus distractor categories remains relatively fixed over the conditions of the experiment, this method may provide a useful way of restricting the trials used to compute the R T measure to those which the participant has actually tracked the form-changing target. In summary, we can state our specific predictions as follows: (a) If participants are more likely to lose target objects as the number of distractors in the display increases, then target categorization errors (incorrectly attributing target form changes to distractors) should increase with increases in the number of distractors. (b) Response latencies to form-changing targets that are no longer tracked (and thus not indexed) should be slower than those for accurately tracked targets, (c) When the accuracy of the participants' categorization decisions (target or distractor form change) is ignored, response latencies to both target and distractor form changes should increase with increases in the number of distractors in the display (as in Experiment 1). (d) Exclud- ing those trials where participants had lost a form-changing

target should greatly attenuate the effect of display size on identification RT. TABLE 2 Mean Response Latencies (in Milliseconds) and Identification Error Rates (in %) to Target and Distractor Form Changes in Experiment 2 Sears and Pylyshyn METHOD Participants. Twenty-four individuals participated in this experiment, and were paid $10 upon completion of the 45- minute session. None of these individuals had participated in Experiment 1. Apparatus and stimuli. The apparatus and stimuli were identical to those of Experiment 1. Procedure. Participants tracked four targets during every trial, and after 5 to 9 s of tracking, one of the objects was transformed into an E or an H figure. The time to identify this form change was measured. The probability that a target or a distractor would undergo a form change was identical (50%), and the participants were informed of these probabilities. After the participant had responded to the form change, the screen was cleared and two boxes labeled "Tracker" (the target) and "Non-Tracker" (the distractor) appeared. The participant then indicated what type of form change had occurred (target or distractor form change) by moving a mouse pointer into the appropriate box and then pressing a mouse button. This categorization decision was made on every trial, and participants were instructed to guess when they were unsure of the correct response. Design. Two factors were manipulated: the type of form change (target or distractor object) and the number of distractors in the display (4, 8, or 12). There were 30 trials in each of the six conditions of this experiment. RESULTS AND DISCUSSION The mean identification error rate (E or H) across all the conditions was 3.9%, and a two-factor repeated measures ANOVA (Type of Form Change x Number of Distractors) yielded no significant effects, all Fs < 1. Response latencies greater than 3 seconds were classified as outliers and were removed from the dataset. This procedure resulted in the removal of 3.1% of the data. Identification data. Before looking at the categorization data, we replicated the analysis that was done in the first experi- ment. In this analysis, the accuracy of the participants' categorization decisions (i.e., attributing the form changes to target or distractor objects) was ignored when computing average response latencies. Thus, the response latencies to form-changing targets that were no longer being tracked (as indexed by target categorization errors) were averaged with the response latencies to targets that had been accurately tracked. This treatment of the data should reproduce the pattern of effects observed in Experiment 1: Responses to Type of Form Change 4 8 Number of Distractors Target M SE Errors Distractor M SE Errors 1410 65.1 4.4 1609 64.7 3.1 1481 62.1 4.1 1662 67.4 4.3 12 1469 59.7 3.6 1722 65.6 3.8 target form changes should be faster than responses to distractor form changes, and response latencies to both target and distractor form changes should increase with increases in the number of distractors. The results of this analysis were in fact identical to those of Experiment 1. The mean response latencies and error rates are listed in Table 2. A two-factor (Type of Form Change x Number of Distractors) repeated measures ANOVA revealed a significant main effect for the type of form change, f(l, 23) = 46.52, MSE = 34,500, and the number of distractors, F(2, 46) = 9.48, MSE - 9,920, but no interaction between these two factors, F(2, 46) = 2.28, MSE = 7,383. Participants identified target form changes an average of 211 ms faster than distractor form changes, and response latencies to both target and distractor form changes increased with increases in the number of distractors. Note that response latencies were longer in this experiment than in Experiment 1, presumably because participants had to determine the identity of the changed object (E or H) as well as the type of changed object (target or distractor) prior to responding. Categorization data. Earlier we argued that target objects are not always perfectly tracked during the tracking task, and that the probability of losing one or more targets increases with increases in the number of distractors. This assumption can now be directly tested. Because participants made a categorization decision on every trial to indicate whether a target or a distractor underwent a form change, the trials in which participants accurately tracked form-changing target objects could be distinguished from the trials in which they did not. For example, if a target object underwent a form change and the participant categorized the event as a distractor form change, then the participant had not accu- rately tracked this particular target throughout the trial. Responses in which the participant correctly identified the form change (E or H) but incorrectly categorized the type of form change (target or distractor) were classified as categori- zation errors. Table 3 lists the percentage of target and distractor

分享到：

赞收藏

资料库

深度学习+多目标追踪经典文章.pdf

相关推荐

人工智能

热门标签

最新资料