Deep Learning in MIR, from Concept to Conversation

By | October 2, 2012

With ISMIR now less than a week away, it’s quickly become that time to start outlining goals and setting expectations for the conference. There is no shortage of exciting ideas that can be traced back to our annual pilgrimage, and this year looks to be no different.

Personally, I’m genuinely stoked for the opportunity to present a position piece at the MIRrors session on Friday, titled “Moving Beyond Feature Design: Deep Architectures and Automatic Feature Learning in Music Informatics.” For reasons I’ll hopefully be able to address in the talk, I strongly believe that the future of MIR resides in the tandem of deep information processing architectures and automatic feature learning. I’m wholly convinced that the potential to solve difficult problems in music informatics and, more broadly, artificial intelligence will come on the heels of advances in deep learning, and that we as a community both can and should be at the bleeding edge – rather than the lagging tail – of these breakthroughs.

Now more than ever, it is an especially critical point to have this discussion at length. Last year, just before ISMIR, I was chatting with a colleague about the impending conference which, as you’ll recall, was in Miami. When I asked what they were looking forward to, as I couldn’t attend, the answer was sobering: “To be honest? The beach. I’m not sure I can sit through a bunch of presentations about someone’s fancy new feature extractor.” I found the insight particularly poignant, not because it was so perfectly candid, but that it succinctly captured an increasingly prevalent sentiment: content-based MIR is getting stale.

Since we’re being candid, I feel it’s hardly a risk to say that a variety of research areas simply feel stuck. I’m not alone when I doubt that genre classification on MFCCs would just work if only someone devised a more powerful classifier. Do we really believe human-comparable chord detection can be solved with short-time chroma features, and that the answer lies in more complicated post-filtering methods? Even more concerning is the fact that these are just the tasks we think we understand. How will these methods scale to artist recognition with more than a few dozen unique bands? Is this really the future of MIR?

The truth is, we have other options now. Methods that were once overblown and disparagingly ineffective are finally practical and delivering on old promises. It is often the case, though, that discussions of “neural networks” tend to have a polarizing effect, resulting in the two common camps of evangelists and skeptics; many have already chosen sides. One of my goals, therefore, going into the next week and beyond is to encourage this dialogue as a single group with a common objective, in hallways and hotels, around coffee carafes and bar stools, because I believe it is an incredibly important one to have. We need to consider our options with an open mind. We need to question our reasoning and challenge our assumptions. What’s more, we need to be honest.

If it wasn’t already apparent, I’m enthused by our place at a collective crossroads. In the interest of making the most of our shared time in Portugal, it is my hope that some mildly provocative text may catalyze a reaction we can transform into productive conversation. I encourage any and all responses here, and look forward to seeing everyone in Porto.

Leave Your Comment

Your email will not be published or shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>