(June 12, 2015) It
has long been speculated whether communication between humans and machines
based on natural speech related cortical activity is possible. Over the past
decade, studies have suggested that it is feasible to recognize isolated
aspects of speech from neural signals, such as auditory features, phones or one
of a few isolated words. However, until now it remained an unsolved challenge
to decode continuously spoken speech from the neural substrate associated with
speech and language processing. Here, we show for the first time that
continuously spoken speech can be decoded into the expressed words from
intracranial electrocorticographic (ECoG) recordings.Specifically, we
implemented a system, which we call Brain-To-Text that models single phones,
employs techniques from automatic speech recognition (ASR), and thereby
transforms brain activity while speaking into the corresponding textual representation.
Our results demonstrate that our system can achieve word error rates as low as
25% and phone error rates below 50%. Additionally, our approach contributes to
the current understanding of the neural basis of continuous speech production
by identifying those cortical regions that hold substantial information about
individual phones. In conclusion, the Brain-To-Text system described in this
paper represents an important step toward human-machine communication based on
imagined speech.