ERPs and alpha oscillations track the encoding and maintenance of object-based representations in visual working memory
Siyi Chen, Thomas Töllner, Hermann J. Müller, & Markus Conci
Ludwig-Maximilians-Universität München, Munich, Germany
Short title: Object-based representations in visual working memory
Word count: 8897 (main text) + 250 (abstract)
Correspondence :
Siyi Chen Allgemeine und Experimentelle Psychologie Department Psychologie Ludwig-Maximilians-Universität München Leopoldstr. 13 D-80802 München Germany
Email: Siyi.Chen@psy.lmu.de
Abstract
When memorizing an integrated object such as a Kanizsa figure, the completion of parts into a coherent whole is attained by grouping processes which render a whole-object representation in visual working memory (VWM). The present study measured event-related potentials (ERPs) and oscillatory amplitudes to track these processes of encoding and representing multiple features of an object in VWM. To this end, a change detection task was performed, which required observers to memorize both the orientations and colors of six ‘pacman’ items while inducing configurations of the pacmen that systematically varied in terms of their grouping strength. The results revealed an effect of object configuration in VWM despite physically constant visual input: change detection for both orientation and color features was more accurate with increased grouping strength. At the electrophysiological level, the lateralized ERPs and alpha activity mirrored this behavioral pattern. Perception of the orientation features gave rise to the encoding of a grouped object as reflected by the amplitudes of the PPC. The grouped object structure, in turn, modulated attention to both orientation and color features as indicated by the enhanced N1pc and N2pc. Finally, during item retention, the representation of individual objects and the concurrent allocation of attention to these memorized objects were modulated by grouping, as reflected by variations in the CDA amplitude and a concurrent lateralized alpha suppression, respectively. These results indicate that memorizing multiple features of grouped, to-be-integrated objects involves multiple, sequential stages of processing, providing support for a hierarchical model of object representations in VWM.
Keywords: visual working memory, object-based representation, grouping, lateralized ERPs, lateralized alpha suppression
Introduction
When perceiving meaningful visual objects in our cluttered environment, the visual system has to integrate disparate component parts into coherent wholes, as demonstrated, for example, by Kanizsa-type illusory figures (Kanizsa, 1955). For instance, as depicted in Figure 1A(left panel), a configuration of six “pacman” elements generates the perception of a star-shaped illusory object (a so-called ‘Kanizsa’ figure) with sharp boundaries that are perceived as lying above the inducing circular elements. The perception of such an illusory object is usually referred to as “modal completion” (see Michotte, Thines, & Crabbe, 1964/1991). Recent neuroimaging studies showed activations in the lateral occipital complex (LOC) to be linked to the processing of Kanizsa figures, with closed shapes being represented via feedback signals from mid-level visual areas to lower-level striate and extrastriate areas (Chen et al., 2020, 2021b; Altschuler et al., 2012; Murray et al., 2002; Lee & Nguyen, 2001; Stanley & Rubin, 2003).
The operation of binding smaller units into integrated whole objects not only supports the structuring of perceptual input for more efficient orienting and action in the environment, but also reduces capacity limitations in visual working memory (VWM; Delvenne & Bruyer, 2006; Morey, 2019; Morey et al., 2015; Nie et al., 2017; Peterson & Berryhill, 2013; Woodman et al., 2003; Vogel et al., 2001). For instance, when remembering the orientation of a gap in various disks, memory performance improves when neighboring disks are grouped to form an illusory rectangle, thereby effectively doubling the maximum number of reportable items in VWM (Diaz et al., 2021; Gao et al., 2016). It has also been suggested that individual, nonspatial features (such as color and orientation) might be represented as bound objects in VWM (e.g., Luck & Vogel, 1997; Luria & Vogel, 2011; but see Gao et al., 2011; Ma et al., 2014). For instance, Luck and Vogel (1997) showed that VWM performance was essentially independent of the number of to-be-memorized features that constituted a given object; instead, memory capacity depended primarily on the number of individuated objects that had to be retained (see also Delvenne & Bruyer, 2004; Vogel et al., 2001; but see Wheeler & Treisman, 2002). Recently, Chen et al. (2021a) combined manipulations of spatial grouping with a concurrent manipulation of feature binding (see also Luck & Vogel, 1997; Luria & Vogel, 2011; Fougnie et al., 2013; Olson & Jiang, 2002; Xu, 2002; Ecker et al., 2013). In their study, a change detection task was used, which required participants to memorize six pacman elements, each depicting a unique color and orientation as presented in an initial memory display. The oriented pacmen could be grouped to form a complete illusory star, render a partially grouped triangle, or, respectively, an ungrouped configuration – thus gradually manipulating the strength of the complete-object representation (see examples in Figure 1A ). Following a brief delay after the memory display offset, a single pacman probe item appeared at one of the locations that had been occupied by an item in the memory display. The task was to decide whether the probe item was the same as or different from the pacman presented previously at the same location in the memory display. Importantly, the change could occur for grouping -relevant features (orientation), or forgrouping -irrelevant features (color). Thus, by systematically varying the amount of closure in the Kanizsa-type configuration (from a complete grouping through a partial grouping to an ungrouped configuration) by systematic variations in orientation, memory performance for individual features (orientation and color) could be assessed relative to the presented grouping that was displayed. The results showed that the grouped object enhanced both the (grouping-relevant) orientation and (grouping-irrelevant) color representations when both features were task-relevant (for the same/different judgment), demonstrating that memory for various features can be improved by encountering them in a spatial grouping.
While grouping benefited the storage of both grouping-relevant and -irrelevant features in VWM, it remains unclear which processes contribute to this benefit, as a facilitatory effect could emerge at various stages of processing. For instance, current models that link object perception, attention and memory (for reviews see e.g., Bundesen et al., 2011; Walther & Koch, 2007) would differentiate between a hierarchy of sequential processing stages that comprises differentiable computational mechanisms and neuronal sources of processing, which encompass the initial, early perceptual stimulus analysis, the subsequent allocation of attention to selected objects, followed by their maintenance in memory. The present study was designed to investigate these component processes by taking advantage of previously established event-related potential (ERP) and oscillatory markers associated with the encoding and maintenance of working memory contents, the aim being to identify critical processes that are influenced by object grouping. That is, we tracked the temporal dynamics of illusory figure processing in order to investigate how object integration impacts early perceptual, attentional, and memory-related processing stages.
The first series of lateralized ERP components of interest include the early positivity posterior contralateral (PPC), the subsequent posterior N1pc, as well as the attention-related N2pc (also referred to as PCN). PPC-like activations have been suggested to reflect selective visual processing under conditions with relative saliency differences between target and distracter stimuli (Akyürek & Schubö, 2011; Corriveauet al., 2012; Fortier-Gauthier et al., 2012; Jannati et al., 2013; Gokce et al., 2014; Barras & Kerzel, 2017), with a positive-going deflection emerging contralateral to the target when the distracter is more salient than the target in the opposite hemifield (Fukuda & Vogel, 2009; Wascher & Beste, 2010; but see Töllner et al., 2012). For instance, the PPC was found to be enhanced when the target was a non-salient “ungrouped” Kanizsa-type configuration and the distractor a grouped, salient Kanizsa figure (presented in the hemifield opposite to the target), relative to a condition that reversed the target and distractors and required observers to search for a salient (grouped) target among a non-salient, ungrouped distractor (Wiegand et al., 2015). Thus, in visual search experiments, all search items are usually distributed across both visual hemifields and the PPC modulation in turn appears to reflect in particular the difficulty to ignore salient distractors when actually searching for a less salient target. By contrast, in working memory tasks, the to-be-memorized array is typically only presented in one hemifield which is prompted by an arrow cue. In this case, the PPC would be interpreted as reflecting the initial (perceptual) processing of task-relevant, attended stimuli (Fortier-Gauthier et al., 2012). A number of studies also found ERPs in response to illusory figures, as compared to ungrouped baseline configurations, to reveal differential processing in the posterior N1 (e.g., Herrmann & Bosch, 2001; Murray et al., 2004; Proverbio & Zani, 2002; Senkowski et al., 2005; see also Murray et al., 2002, for even earlier effects), where this early signal might reflect the initial biasing of attentional priority towards illusory figures in the competition for selection (Senkowski et al., 2005). In the subsequent time window, the actual spatial-attentional selection of grouped vs. ungrouped configurations is indexed by the N2pc (Conci et al., 2006; 2011; Töllner et al., 2015). Previous work showed search to be more efficient for grouped, as compared to ungrouped, targets (Conci et al., 2007; see also Nie at al., 2016), and this is associated with larger N2pc amplitudes – which is indicative of enhanced engagement of focal attention by the grouped target (Conci et al., 2011) as opposed to a broader tuning of attention by grouped, task-irrelevant distractors (Conci et al., 2006). Thus, previous evidence suggests that the processing of an illusory figure might be reflected in early perceptual ERPs (PPC), in the subsequent biasing of initial attentional priorities (N1pc) and in the N2pc, which is typically associated with the allocation of (focal) attentional processing resources to a given (target) item (e.g., Eimer, 1996).
An additional component of interest is the contralateral delay activity (CDA), a sustained negativity during the delay period between the memory and test displays. The CDA has been found to monotonically scale with the number of items held in VWM up to the measured storage limit (of approximately 3 - 4 items; Fukuda et al., 2015; Luria et al., 2016; Vogel & Machizawa, 2004). The CDA amplitude has also been reported to decrease in some studies when to-be-remembered objects are bound or grouped into higher-order units (Luria & Vogel, 2011; Luria et al., 2016; Peterson et al., 2015), suggesting that it actually reflects the number of “integrated units” represented in VWM. For example, the CDA amplitude was comparable when memorizing only orientation features as opposed to both color and orientation features, which were presented on the same physical objects, whereas the CDA increased when the same orientation and color features were presented as separate objects (Luria & Vogel, 2011; Woodman & Vogel, 2008). The difference in the CDA amplitude thus appears to reflect the number of separable objects. Moreover, it has also been reported that similar colors may be compressed in VWM such that the CDA amplitude for these colors is essentially comparable to the amplitude for just one to-be-memorized color (Gao et al., 2011; Peterson et al., 2015). Finally, the CDA has also been shown to provide a characteristic, task-dependent signature of the active maintenance process, where a larger CDA amplitude is observed for identical stimuli when the task requires the encoding of objects with high (as opposed to low) precision (Machizawa et al. 2012). In agreement with this finding, Chen et al. (2018b) investigated “amodal” completion (of occluded objects) in VWM and reported a sustained increase in the CDA amplitude for globally completed objects (as compared to uncompleted objects). For instance, when observers were required to memorize occluded parts of an object, persistent mnemonic activity (as indexed by an increased CDA amplitude) was required to generate complete-object representations from physically specified fragments and in order to maintain the resulting complete-object representations in a readily accessible form (see also Ewerdwalbesloh et al. 2016; Pun et al. 2012; Emrich et al. 2008). This suggests that the representation of a globally completed object may, in some cases, also require more (rather than less) mnemonic resources. Previous studies not only reported comparable behavioral dynamics (e.g., Chen et al., 2018a) but also partly overlapping neural mechanisms for amodal and modal completions (Murray et al., 2004). It might therefore be conceivable that modally completed, grouped vs. ungrouped variants of a Kanizsa figure reveal similar VWM storage properties and generate similar CDA patterns to shapes that are completed on the basis of amodal completion. In sum, the role of the CDA concerning object binding and grouping reveals a rather complex and seemingly flexible mechanism, which is not necessarily reflecting bottom-up objecthood cues on the basis of their salience alone (for a review, see Luria et al., 2016). Rather, the CDA appears to depend on specific stimulus characteristics in combination with the related task demands.
Apart from ERPs, the maintenance process can also be tracked with oscillatory markers. Several studies have demonstrated that posterior (putatively visual) alpha oscillations (8–12 Hz) in the retention interval are reduced in amplitude contralateral vs. ipsilateral to the retinotopic location of the to-be-retained items (e.g., Grimault et al., 2009; Lozano-Soldevilla et al., 2014), evidencing a relative amplitude difference between mnemonically relevant and irrelevant information. Accordingly, lateralized alpha-band activity has been taken to play a role in mnemonic retention (for a review, van Ede, 2018; Medendorp et al., 2007; Fukuda et al., 2015; Erickson et al., 2017). Several studies have further demonstrated a link between alpha oscillations during retention and the concurrent location and orientation of to-be-remembered items (Foster et al., 2016; Fukuda et al., 2016), suggesting that alpha oscillations during VWM maintenance also track feature-specific identity information of the to-be-memorized items (Fukuda et al., 2016). Note that, posterior-occipital alpha has also been widely suggested to reflect an online index of top-down adjustments of attentional control (e.g., Thut et al., 2006; Murphy et al. 2020; Wang et al., 2019; 2021; Woodman et al., 2022), which is a critical factor contributing to effective VWM maintenance (Unsworth et al. 2014; Engle & Kane, 2004). Moreover, posterior-occipital alpha suppression has been shown to vary with changes in the attentional engagement (Boudewyn & Carter, 2017), with larger alpha suppression being evident when the attentional demands increase. Recall that VWM is usually considered to reflect a system that provides both short-term stores of representational formats and concurrent attentional, “executive” control structures that keep task-relevant information active and accessible during maintenance (Engle & Kane, 2004). The CDA and lateralized alpha may thus be mapped onto two separable cognitive mechanisms, relating to (i) the representation of individual objects and (ii) associated internal attentional control processes, respectively. That is, an increase in the lateralized alpha suppression for the to-be-remembered items might be directly associated with the increase in attentional control in particular when the number of items in the display exceeds the individual’s capacity to select a manageable subset of items for efficient VWM storage (see also Fukuda et al., 2015).
In summary, the present study was designed to examine neural processing stages potentially implicated in the grouping benefits when memorizing individual features. Participants’ (lateralized) electrophysiological brain activity was recorded while they performed a change detection task that presented a to-be-memorized configuration comprising six pacman items on one side of the display and a to-be-ignored placeholder configuration of six gray circles on the other side. Participants had to memorize the color and orientation of pacman items that were presented either as a fully grouped, a partially grouped, or an ungrouped configuration. Note that the various pacman arrangements produced configurations differing in grouping strength, however without impacting the low-level properties of the image (see Figure 1A ). That is, the number of items and their overall physical stimulation was identical for the grouped, partially grouped and ungrouped stimulus configurations (and for the task-irrelevant placeholders), and the three to-be-memorized types of configuration would therefore only differ in terms of grouping strength from each other. Subsequent to a retention interval, the test display was presented, which would reveal a probe item on the cued side (and a placeholder circle on the uncued side). The probe would either depict a color change, an orientation change, or no change (see Figure 1B ). In this way, we were able to track at the neural level how the VWM representation of individual features is aided by grouping. We assessed behavioral performance measures (change detection accuracy) and lateralized ERP components, as well as oscillatory signals.
Based on our previous, related study (Chen et al., 2021a), we expected a grouping benefit in the change detection performance, that could in principle be mirrored in several lateralized ERP components and/or in corresponding oscillatory signals. We predicted that PPC amplitudes which reflect the initial perceptual processing of the stimuli might be modulated by the grouping of the to-be-memorized configurations because of their inherent differences in the attentional requirements of initial visual processing. For instance, the less a given configuration is grouped, the greater the attentional requirements to process this stimulus, which should be reflected in the PPC amplitudes. Variations in attentional selection should also be evident in the subsequent N1pc and N2pc components, revealing a more focused (and more strongly lateralized) shift of attention to the to-be-memorized configuration alongside with an increase in grouping strength. For the memory stage, orientation-based grouping might reduce the load by maintaining integrated, coherent shape representations, thus enhancing the VWM capacity for both color and orientation features, resulting in increased CDA amplitudes. At the same time, the generation of a global shape representation in the grouped Kanizsa figure might also be expected to require more mnemonic resources or, storage capacity than less grouped items in order to achieve a higher representational precision and this should also impact the CDA. Finally, lateralized alpha suppression contralateral to the to-be-remembered configurations was expected to reveal variations of cognitive control devoted to the memorized items in order to keep them active and accessible during the execution of complex cognitive tasks. There might be a larger alpha suppression for ungrouped relative to more grouped configurations thus reflecting greater executive attention (and increased difficulty) to hold the individual features for ungrouped configurations during maintenance.
Method
Participants. 24 volunteers (12 females, mean age = 26.13 years; SD = 2.67 years, all were right-handed) participated in the experiment, for payment of \euro 9.00 per hour. All participants had normal or corrected-to-normal visual acuity and normal color vision. No subject reported mental or neurological diseases. All observers provided written informed consent, and the experimental procedure was approved by the ethics committee of the Department of Psychology at Ludwig-Maximilians-University, Munich. The sample size was larger than previous, similar studies (Chen et al., 2021a; Gao et al., 2016). A power analysis conducted with G*Power (Erdfelder et al., 1996) revealed that to detect a relatively large effect, f(U) = 0.5, of object configuration with a power of 95% and an alpha of .05, a sample of only 12 participants would be required. We further increased our sample toN = 24 observers to ensure sufficient statistical power in our analyses.
Apparatus and Stimuli. The experiment was programed in Matlab using Psychophysics Toolbox functions (Brainard, 1997). Stimuli were presented on a 19-inch computer monitor (1,024 × 768 pixels screen resolution, 85-Hz refresh rate) against a black screen background (0.25 cd/m2). Participants were seated at a distance of approximately 65 cm from the screen inside a shielded Faraday cage (Industrial Acoustics Company GmbH, Germany).
A bilateral version of the change detection task was adapted from previous studies, so as to be able to measure lateralized EEG components (e.g., Vogel & Machizawa, 2004). The to-be-memorized stimulus configuration (which was either presented on the left or right side of the screen) consisted of six items, presented on an imaginary circle (radius: 4° of visual angle), with all items arranged equidistantly to one another. Each item was a filled circle with a radius of 2.4° of visual angle and a 60° opening (1/6 of the overall area of the circle), thus forming a “pacman”-like figure. Each pacman was presented in a different color (all 5.0 cd/m2; blue, RGB: 49,64,249; red, RGB: 172,11,2; green, RGB: 15,102,11; purple, RGB: 138,35,160; orange, RGB: 140,70,0, and mint, RGB: 50,99,109) and with a different orientation of its “mouth” (i.e., for a given pacman, the cut-out section could be rotated at an angle of 0°, 60°, 120°, 180°, 240°, or 300°, respectively). The distribution of the six colors among the six items was randomized on every trial. The distribution of the “mouth” orientations was determined by the three experimental conditions that were presented with equal probability throughout the experiment. In the “ungrouped” condition, the six possible mouth orientations were randomly assigned to the six display locations (Figure 1A, Ungrouped ). In the “partial-grouping” condition, the openings of three items were oriented towards the center of the display, thus forming either an upward- or downward-pointing (illusory) triangle (Figure 1A, Partially grouped ). The mouth orientations of the other, remaining three items were selected randomly from the remaining three orientations (without replacement of an already assigned orientation). Finally, in the “grouped” condition, the openings of all six items were oriented towards the center of the screen such that they formed an illusory star (Figure 1A, Grouped ). In this way, a given memory display would always consist of six distinct colors and six distinct mouth orientations, irrespective of the grouping condition. Thus, for all three types of configuration, each display presented an equal number of (six) colors and orientations, such that the basic physical stimulation was identical across conditions. Of note, the ungrouped configuration served as a baseline: the pacman elements were randomly oriented (as well as randomly colored), making them unlikely to render any kind of grouped object, allowing us to assess whether change detection performance would be enhanced by any type of grouped structure. Finally, in the hemifield opposite to the memory array, a to-be-ignored placeholder configuration was presented, which consisted of six gray (RGB: 92,92,92) circles with a central hole (Figure 1A Placeholder ). These placeholders were similar in luminance to the memory items, and the size of the removed central circle corresponded to the size of the cut-out segment in the pacman items. This ensured that both display halves presented stimulus arrays with an identical physical stimulation, yet only the memory configuration provided task-relevant color and orientation information, while the placeholders remained constant throughout the entire experiment.
Procedure and Design. Figure 1B illustrates an example trial sequence. Each trial started with the presentation of a central white fixation circle (0.6° × 0.6°), which remained on the screen for the entire trial. After 300 ms, two white arrows (1.1° × 1.1°) appeared above and below the fixation circle for 300 ms, with both arrows pointing either to the left or to the right (with equal probability). After a short delay period (that lasted for a random interval between 300 and 500 ms), the memory display appeared for 300 ms, presenting an ungrouped, partially grouped, or grouped configuration on the cued side (i.e., as indicated by the initially presented arrows) together with a gray placeholder configuration on the uncued side. This was followed by a 1000-ms retention interval during which a blank screen was presented. Next, a test display appeared consisting of a single gray circle on the uncued side and a single pacman item – each positioned randomly at one of the six possible item locations (that had been occupied in the memory array) on the cued (and uncued) side. The probe display was presented until the participant issued a response: pressing the left or, respectively, the right mouse key to indicate whether the probe item was the same as or different from the pacman at the same location in the preceding memory display. Participants were instructed to respond as accurately as possible. In half of the trials, the probe on the cued side was identical (in terms of both color and gap orientation) to the item presented at that particular location in the previous memory display (no-change condition). In the other half of trials, the probe item was changed in either color or orientation (with equal probability) relative to the probed item in the memory array. The change was realized by presenting the probed item in either the color or the orientation of one of the other five items (randomly selected) in the memory display, thus encouraging observers to memorize individual items as conjunctions of color and orientation (rather than just independent sets of orientations and colors).