The authors have declared that no competing interests exist.
Conceived and designed the experiments: JRG LRS GRB. Performed the experiments: JRG GRB. Analyzed the data: JRG GRB. Contributed reagents/materials/analysis tools: LRS GRB. Wrote the paper: JRG LRS GRB.
Magnetoencephalography (MEG), a noninvasive technique for characterizing brain electrical activity, is gaining popularity as a tool for assessing grouplevel differences between experimental conditions. One method for assessing taskcondition effects involves beamforming, where a weighted sum of field measurements is used to tune activity on a voxelbyvoxel basis. However, this method has been shown to produce inhomogeneous smoothness differences as a function of signaltonoise across a volumetric image, which can then produce false positives at the group level. Here we describe a novel method for grouplevel analysis with MEG beamformer images that utilizes the peak locations within each participant’s volumetric image to assess grouplevel effects. We compared our peakclustering algorithm with SnPM using simulated data. We found that our method was immune to artefactual group effects that can arise as a result of inhomogeneous smoothness differences across a volumetric image. We also used our peakclustering algorithm on experimental data and found that regions were identified that corresponded with taskrelated regions identified in the literature. These findings suggest that our technique is a robust method for grouplevel analysis with MEG beamformer images.
The use of magnetoencephalography (MEG) as a research tool for brainimaging in both normal and clinical populations is burgeoning. With advances in signal processing, beamforming has gained traction as a meaningful approach to sourcelocalization in MEG. In beamforming, a weighted sum of field measurements is used as a spatial filter to tune an estimate of neural activity (i.e.,power) in a prespecified time and frequency band window on a voxelbyvoxel basis. This produces a wholebrain volumetric image of signal power change which can be used for grouplevel analyses.
One problem in conventional MEG group analysis is that individual beamformer images are not homogeneously smooth; the images are information rich around strong sources, yet very smooth elsewhere
In this paper, we try to step around this reconstruction problem by compressing the volumetric image to a point list of local maxima, which in turn simplifies the statistics. This is advantageous as one often ultimately wishes to interrogate individual participant beamformer estimates of electrical activity, which have been shown to be only truly reliable at the image peaks
The paper is divided into three sections. In the first section, we describe the peakclustering algorithm and define a method for correcting for multiple comparisons when testing over a range of peaks for grouplevel effects. In the second section, we compare our peakclustering algorithm against SnPM using simulated data. In the third section, we utilize our algorithm to test for grouplevel effects in experimental data.
To compare the distribution of the M topranked image peaks (per person) over a group of participants against any random selection of peaks, we used the following algorithm. (The matlab code is available from the corresponding author on request.):
1. Rank order the image peaks for each participant and store their corresponding locations. Since the test is based on rank order, the user must specify an interest in positive or negative peaks. The data presented in this manuscript used normalized ttests between conditions to create images.
2. Take the coordinates of the top M peaks from each of N participants. Construct the smallest possible ellipsoid that contains a single peak from each participant. The issue here is that the top peak in participant 1 may be at the same location as the 3^{rd} peak in participant 3, etc. By selecting from M peaks, one trades off the precise peak order against spatial resolution (see later).
3. Establish if this ellipsoid is smaller (in terms of the major radius) than one would expect by chance. The computation of this radius under the null hypothesis is done by randomly assigning ranks to peak locations and repeating step 2 a large number of times (e.g., 500 in this paper). This produces a distribution of radii which one would expect due to chance (if peak rank were not important).
To give a simple example, how likely is it that the image maxima for ten participants (N = 10, one peak so M = 1) are within 1 cm of one another? To answer this, one can compute how close the image maxima will be by chance by simply taking a random image peak from each participant and repeating this process to get a null distribution of ellipsoid radii. Now one computes the same size metric using ranked peaks from each participant, then reads off the number of randomly drawn ellipsoids that are smaller than this (e.g., p<0.01).
For a given number of participants (N) and peaks (M), a kmeans clustering procedure was iteratively used to derive M separate ellipsoids (ideally each of N points) from N*M points. Clusters were trimmed such that each set contained at maximum one point per participant (selecting the point closest to the centroid). At the end of the iterative procedure (typically 30 iterations), one is left with a set of the smallest (based on standard deviation of the point list) clusters for varying numbers of participants (from a user specified minimum up to a maximum of N). For these point lists, ellipsoid axes were computed from the eigenvectors and the standard deviation in each direction (and hence the 95 percentiles) computed from the corresponding eigenvalues.
The peak clustering algorithm requires some apriori selection of the parameter M, or the number of topranked peaks to consider in the analysis. Typically, therefore, it is necessary to test a range of values of M, and hence there is a corresponding multiple comparisons penalty. In this section, we examine the dependence of our results on this parameter and propose an approximate heuristic for dealing with it in the future.
The relationship between the number of peaks used (M) and the 95% significant (maximum) radius of the confidence ellipsoid (in mm) for subgroups of N = 5 (blue), 7 (green) and 10 (red). Intuitively, the larger the N, the larger the size of the cluster one would expect to occur by chance. In contrast, the larger the number of peaks per subject (M) considered, the easier it will be to reach a given cluster size, hence the 95% threshold decreases as more peaks are included in the analysis.
The parameter
A plot of the heuristic
The next problem is how to set an appropriate significance level. There is a single univariate null hypothesis–that the peaks are clustered by chance. However, as we change (increase) M, we are retesting the same hypothesis with different subsets of data. Hence, a multiple comparisons penalty is necessary. One simple solution would be to only examine the function minima at each value of N. One problem here is that the minima are relatively flat and the smoothness depends on the number of random permutation steps performed, which is processing intensive. Also, one can see from
Another possibility is to consider the range of M which defines this minimum. This approach does not rely on the identification of minima (so it is more robust) and can be computed for all N at once. However, there is a multiple comparisons penalty. It is important to note, however, that a completely new (i.e., independent) set of data is only introduced each time the number of peaks is doubled.
Making a Bonferroni correction, the significance level should be decreased by a factor each time the number of peaks is doubled. This means that the test wise error rate to give a family wise error rate of 0.05 is based on the following Bonferroni correction:
In order to test algorithm performance against some ground truth we simulated a single dipolar source across a group of participants. The same single sphere head model and sensor locations were used for each simulated participant. System white noise was simulated at 10 fT/sqrt (Hz) over a bandwidth of 80 Hz. Data for 10 participants were simulated, differing only in the simulated source location and white noise realization. In each simulated participant, a random seed location was generated, drawn from a Gaussian distribution of standard deviation 5 mm, centered on MNI location x = 52, y = −29, z = 13. The nearest canonical mesh location
The beamformer is simply a spatially filtered expression of the MEG sensor data.
We should note that in the experimental data analysis stage, we used the proprietary software (SAM) to analyze the data
Different participant groups were constructed by drawing 8 of these 10 images randomly twenty times. For each participant group, we used SnPM (multiple participant, one sample ttest, variance smoothing 25 mm) to identify significant (family wise error = 0.05) positive effects across the normalized power difference images. Using the peak clustering algorithm, we used the same data to look for clusters within the top 5 image peaks that were smaller than one would expect by chance (i.e., M = 5 peaks, N = 8 participants). For each simulated group, we compiled a list of the significant local maxima (p<0.05 corrected) in the SnPM images and a list of the centers of the peakclusters deemed significant. We classed a hit as a peak/ellipse center closer than 20 mm to the initial MNI seed location and a miss to be any significant peak or ellipsoid center outside this range. The peaks were defined by local image maxima identified using the SPM function spm_max based on 18 neighbors. This means that two local maxima can be as close as a single (nonmaximal) voxel apart.
We assessed the performance of our peakclustering algorithm on experimental data. In our experiment, ten righthanded volunteers (Mean Age = 29.4 years, range = 20–36 years; 2 males) gave written informed consent following Aston University ethical guidelines and participated in the MEG study. The protocol was approved by the Aston University Institutional Review Board and complied with all guidelines expressed in the Declaration of Helsinki. Briefly, participants (N = 10) performed a superordinatelevel categorization task on pictures of objects drawn from 3 living and 3 nonliving categories (see
During study 1, participants were shown a 1000 ms red fixation cross, followed by a 300 ms category probe. After a variable (1000, 1050, or 1100 ms) delay interval, participants were shown a target object for 800 ms.
Data for each participant were edited and filtered to remove environmental and physiological artefacts. A LCMV beamformer was then used to produce 3dimensional images of cortical power changes
We used SnPM (multiple participant, one sample ttest, variance smoothing 6, 12, and 24 mm) to identify significant (family wise error = 0.05) positive effects across the normalized power difference images. We also used our peakclustering algorithm to test over a range of M values from M = 2 through 40 (we utilized only positive peaks in the analysis), which means that in order to maintain a family wise error rate of 0.05, our test wise error rate was adjusted to p = 0.0094. After multiple comparisons correction, we were left with a number of significant clusters of peaks (see
Location  BA  N  M  Coordinates ofCenter (x, y, z)  Volume(mm^{3})  Major Radius(mm)  Mean Value  pvalue 
Left Inferior Occipital Gyrus  19  
6  5  −37, −83, −10  7,777  22.8  1.95  0.00  
7  6  −35, −85, −14  16,488  28.4  1.91  0.01  
5  8  −49, −73, −9  458  17.4  1.71  0.01  
6  8  −34, −85, −9  2,352  15.7  1.99  0.00  








6  9  −34, −85, −9  2,352  15.7  1.99  0.00  
6  10  −34, −85, −9  2,352  15.7  1.99  0.01  
6  11  −34, −85, −9  2,352  15.7  1.99  0.00  
6  12  −34, −85, −9  2,352  15.7  1.99  0.01  
5  13  −35, −85, −6  1,227  11.6  1.99  0.00  
5  14  −35, −85, −6  1,227  11.6  1.99  0.01  
Right Superior Temporal Gyrus  38  
5  11  49, 3, −14  406  12.6  1.78  0.00  
5  12  49, 3, −14  406  12.6  1.78  0.01  
5  13  49, 3, −14  406  12.6  1.78  0.01  








6  16  49, 5, −14  1,839  12.4  1.70  0.01 
Top panel shows the total number of significant local maxima over 20 simulated subject groups (with a single simulated source) identified using SnPM (dotted) and the peak clustering method (solid) as source magnitude is increased. Local maxima within 2 cm of the simulated source are defined as hits and those greater than 2 cm misses. Note that both methods consistently identify the correct source location at high SNR (20 hits, 0 misses) but that SnPM tends to produce a large number of artefactual significant regions at moderate SNR. This error rate is due to the smoothness of the beamformer images that gives rise to statistically significant overlapping sidelobes. These effects are shown in the lower panel, where maps of the percentage of significant voxels (from the 20 groups) are shown in the glassbrain.
The SnPM analysis did not identify any regions showing significant positive power differences when using 6 or 12 mm variance smoothing. However, a single region centered in right anterior middle to superior temporal gyrus (Talairach coordinates of center = 48, 3, −18) was identified when we set variance smoothing to 24 mm (see
A) The region in right anterior middle to superior temporal gyrus identified by the SnPM analysis as showing significantly greater power for living compared with nonliving objects. B) The two regions identified by the peakclustering algorithm as showing significantly greater power for living compared with nonliving objects. Red = Inferior Occipital Gyrus; Blue = Superior Temporal Gyrus. The sagittal images show the approximate slice locations (z coordinates are given below each slice) shown on the corresponding axial image (at right, blue lines, arranged inferior to superior) on a template brain.
We have presented a peakclustering algorithm for grouplevel analysis with MEG beamformer images. Our algorithm determines whether a range of image peaks (M) is closer than expected by chance. We compared the peakclustering algorithm performance to a more traditional group imaging method (SnPM) and found the algorithm to be robust to artefacts of smoothness that can give rise to erroneous MEG beamformer group effects. There is an important distinction here between false positives due to type 1 error and the effects we are trying to correct for. Both SnPM and the peakclustering algorithm have, by definition, the correct type 1 error rate (as it is set in both cases by permutation). Neither is there a problem with SnPM. The issue we are trying to correct for here is one of source reconstruction, where a small number of data channels are projected into a large number of voxels, resulting in images which are very smooth in certain regions. It is therefore a way of pruning away redundant information from beamformer images to reduce the likelihood that these smooth and information sparse regions of source space contribute to the group effect.
Our approach is similar to a dipole fit analysis approach used previously
As mentioned previously, in the algorithm we are effectively trying to compensate for the few (i.e., channel) to many (i.e., voxel) mapping in M/EEG volumetric source reconstruction. This problem is exacerbated in beamformer analyses because of the dependence of spatial resolution not only on system sensitivity, but also on source power
The algorithm requires a parameter that defines the number of topranked peaks to consider (M) for each participant. This parameter has important implications for cluster size. Since the algorithm first computes chance volume sizes using a random selection of peaks, using a small number of peaks can produce a large cluster size for the null distribution. Rather than arbitrarily determining the number of peaks for the algorithm to consider, we developed a heuristic that balances peak rank against cluster size that requires the user to test over a range of M values and use a Bonferroni correction for multiple comparisons. For example, to maintain a family wise error rate of 0.05 when testing over 38 Pvalues (i.e., 2–40), the test wise error rate becomes 0.0094. It is important to note that the choice of M can be made based on simulations or on the data themselves, as long as an appropriate multiple comparisons correction is made. For this reason we had expected the algorithm to be more conservative than volumetric approaches (like SnPM), but by only dealing with the image in its compressed pointlist form, rather than all voxels, we have also considerably reduced the multiple comparison correction necessary. This may explain why, counter to our expectation, the algorithm picked out significant features in the experimental dataset that were not apparent in (the volume corrected) SnPM tests.
In our experimental study, participants were required to perform a superordinatelevel categorization task on pictures of living and nonliving objects. The SnPM analysis yielded mixed results based on the variance smoothing used. When using both 6 and 12 mm, no regions survived statistical significance. However, when using 24 mm, a single region in right anterior middle to superior temporal gyrus showed significantly greater power for living than nonliving objects. Using the peakclustering algorithm, we also found a significant cluster of activity in right anterior superior temporal gyrus, overlapping with the region identified by the SnPM analysis. In addition, we identified a region in left inferior temporal gyrus showing greater power for living than nonliving objects, which we did not find in our SnPM analysis. In order to determine whether the SnPM analysis yielded a peak in left inferior temporal gyrus that simply did not survive wholebrain correction, we looked at the t map produced in our SnPM analysis. We found a cluster of activity centered in left inferior temporal gyrus (peak value = 2.95), which suggests that left inferior temporal gyrus would be significant if we performed a regionofinterest analysis (rather than a wholebrain analysis) using roughly 7 independent voxels (or ROIs). This would be in accord with our explanation that the peak clustering analysis has a less stringent multiple comparisons penalty, as it considers only a limited number of image peaks per subject (indeed for these analyses there were 8 peaks per participant). Both of these regions we would expect to be active based on previous neuroimaging studies which have suggested that the inferior temporal/occipital gyri are important for form recognition, and that reliance on visual form is more important for living than nonliving objects
As with many nonparametric techniques, the peak clustering method sacrifices some sensitivity for an increase in robustness, and requires that some feature of interest (here, each peak) is identifiable in the majority of individuals. This would not be the case in standard random or fixed effects models in which subthreshold effects in the individual can be picked up in the group. Allowing the algorithm to identify smaller subgroups is a matter for debate. In some cases, the objective identification of subgroups might be a useful feature of the algorithm. Forcing the algorithm to be selective to only those regions in every participant that have a local maximum makes it extremely conservative. Once could also argue that a group effect is meaningless if one does not include the whole group. Yet, in classical volumetric approaches, random effects analysis allows some heterogeneity in the effects over the population. As long as the values of N (e.g., N = 9 for a group of 10) are reported then the reader can make his/her own inference on the strength of the finding (e.g., an effect in 90% of the participants). Also, the technique will not be sensitive to truly spatially extended regions of electrical activity that are not artefacts of smoothness, as only the peaks within each image are considered in the analysis.
In sum, we have found that our peakclustering technique offers a number of advantages over current grouplevel analysis approaches with MEG. The method is immune to inhomogeneous smoothness introduced by imperfect volumetric M/EEG source reconstruction and exacerbated in beamformer implementations, and indeed it makes no assumptions about the underlying image properties. In addition, the null distributions of source locations are constructed from the data itself and the randomization testing takes into account the multiple comparisons problem (for a given M). As the test is based on rank, it should be relatively robust to physiological artefacts and as a default we would leave the artefact identification until the posthoc analyses. For example, eyeball artefacts should result in significant clusters in the eyes. Subgroup statistics are also available, so, for example, bounds for any 5 of N participants having significantly clustered peaks can automatically be tested. Finally, by providing confidence intervals on peak location, the technique would be well suited to situations in which one would like to make some spatial inference concerning peak location. For example, whether peaks from a particular subject group derive from a specific cortical location.