In a recent paper published in PLOS Biology, researchers used computer modeling to reconstruct a piece of music from neural recordings, while using coding models and ablation analysis to study the spatial neural dynamics underlying music perception.
Study: Music can be reconstructed from human auditory cortex activity using nonlinear decoding models. Image source: lianleonte / Shutterstock
background
Music, a universal human experience, activates many of the same brain regions as language. Neuroscientific researchers have been studying the neural basis of music perception for many years and have identified different neural correlates of musical elements, including timbre, melody, harmony, pitch and rhythm. However, it remains unclear how these neural networks interact to process the complexity of music.
“One of the things about music for me is that it has prosody (rhythms and intonation) and emotional content. As the field of brain-machine interfaces advances, he explains, “this research could help add more musicality to future brain implants for people with neurological impairments or developmental disabilities that affect speech.”
-DR. Robert Knight, University of California, Berkeley
About the study
In the present study, researchers used stimulus reconstruction to examine how the brain processes music. They implanted 2,668 electrocorticography (ECoG) electrodes on the cortical surfaces (brains) of 29 neurosurgical patients to record neuronal activity or collect their intracranial electroencephalography (iEEG) data while they passively listened to a three-minute snippet of the Pink Floyd song Listen to “Another Brick in” Die Mauer, Part 1.
The use of passive listening as a method of stimulus presentation prevented confounding the neural processing of music with motor activity and decision making.
Based on data from 347/2668 electrodes, they reconstructed the song, which closely resembled the original, albeit with fewer details, e.g. B. the words in the reconstructed song were much less clear. Specifically, they used regression-based decoding models to accurately reconstruct this auditory stimulus (in this case a three-minute song excerpt) from the neuronal activity.
In the past, researchers have used similar methods to reconstruct language from brain activity; However, this is the first time they have attempted to reconstruct music using such an approach.
iEEG has a high temporal resolution and an excellent signal-to-noise ratio. It provides direct access to high frequency activity (HFA), an index of non-oscillatory neuronal activity that reflects local information processing.
Likewise, nonlinear models decoding from the auditory and sensorimotor cortex have yielded the highest decoding accuracy and a remarkable ability to reconstruct intelligible speech. Therefore, the team combined iEEG and nonlinear decoding models to uncover the neural dynamics underlying music perception.
The team also quantified the influence of data set duration and electrode density on reconstruction accuracy.
Anatomical location of the electrodes responding to songs.(A) Electrode coverage in all 29 patients listed in the MNI template (N = 2,379). All electrodes presented are free of any artificial or epileptic activity. The left hemisphere is shown on the left. (B) Position of the electrodes that significantly encode the acoustics of the song (Nsig = 347). Significance was determined using STRF prediction accuracy bootstrapped over 250 resamples of the training, validation, and test sets. The marker color indicates the anatomical marker determined using the FreeSurfer atlas, and the marker size indicates the predictive accuracy of the STRF (Pearson’s R between actual and predicted HFA). In the following panels and figures we use the same color code. (C) Number of significant electrodes per anatomical region. A darker shade indicates a position in the right hemisphere. (D) Average STRF prediction accuracy per anatomical region. Electrodes previously labeled as supramarginal, other temporal (i.e., other than STG), and other frontal (i.e., other than SMC or IFG) are grouped together, labeled as other, and shown in white/gray. Error bars indicate SEM. The data underlying this figure can be accessed at https://doi.org/10.5281/zenodo.7876019. HFA, radiofrequency activity; IFG, inferior frontal gyrus; MNI, Montreal Neurological Institute; SEM, standard error of the mean; SMC, sensorimotor cortex; STG, superior temporal gyrus; STRF, spectrotemporal receptive field. https://doi.org/10.1371/journal.pbio.3002176.g002
Results
The study results showed that both hemispheres of the brain were involved in music processing, with the superior temporal gyrus (STG) in the right hemisphere playing a more important role in music perception. Furthermore, although both the temporal and frontal lobes were active during music perception, a new STG subregion was tuned to musical rhythm.
Data from 347 electrodes from about 2,700 ECoG electrodes helped researchers identify music encoding. The data showed that both hemispheres of the brain were involved in music processing, with electrodes in the right hemisphere responding more actively to the music than in the left hemisphere (16.4% vs. 13.5%), a finding in direct contrast to the language stands. What’s notable is that language elicits stronger responses in the left hemisphere of the brain.
However, in both hemispheres, most music-responsive electrodes were implanted over a region called the superior temporal gyrus (STG), suggesting that it likely played a crucial role in music perception. STG is located directly above and behind the ear.
Furthermore, the study results showed that nonlinear models provided the highest decoding accuracy, r-squared of 42.9%. However, adding electrodes beyond a certain amount also reduced decoding accuracy; Therefore, removal of 43 right rhythmic electrodes decreased decoding accuracy.
The electrodes included in the decoding model had unique functional and anatomical features, which also influenced the decoding accuracy of the model.
Finally, regarding the influence of dataset duration on decoding accuracy, the authors found that the model achieved 80% of the maximum observed decoding accuracy in 37 seconds. This result emphasizes the use of predictive modeling approaches (as used in this study) in small data sets.
The study data could have implications for brain-computer interface (BCI) applications, e.g. B. Communication tools for people with disabilities and limited speech. Because BCI technology is relatively new, available BCI-based interfaces produce speech with an unnatural, robotic quality that could be enhanced by incorporating musical elements. In addition, the study results could be clinically relevant for patients with auditory processing disorders.
Diploma
Our results confirm and extend previous findings on music perception, including the dependence of music perception on a bilateral network with a right lateralization. Within the spatial distribution of music information, redundant and unique components were distributed between the STG, middle temporal gyrus (SMC), and inferior frontal gyrus (IFG) in the left hemisphere and concentrated in the STG in the right hemisphere, respectively.
Future research could aim to extend electrode coverage to other cortical regions, vary the features of the nonlinear decoding models, and even add a behavioral dimension.