Yet we can easily hear whether the model is capturing long term structure on the order of hundreds to thousands of tokens. It has the fluid token structure of text (in images you can look back N tokens and find the row above, whereas in music there’s not a fixed number for looking back to the previous measure). Music generation is a useful domain for testing the Sparse Transformer as it sits on a middle ground between text and images. It can also create musical melodic structures, as in this sample imitating Mozart: This long context may be one reason why it is able to remember long-term structure in a piece, like in the following sample imitating Chopin: MuseNet uses the recompute and optimized kernels of Sparse Transformer to train a 72-layer network with 24 attention heads-with full attention over a context of 4096 tokens. ![]() Here we use t-SNE to create a 2-D map of the cosine similarity of various musical composer and style embeddings. We can visualize the embeddings from MuseNet to gain insight into what the model has learned. ![]() Or prompted with the band Journey, with piano, bass, guitar, and drums: At generation time, we can then condition the model to create samples in a chosen style by starting with a prompt such as a Rachmaninoff piano start: During training time, these composer and instrumentation tokens were prepended to each sample, so the model would learn to use this information in making note predictions. We created composer and instrumentation tokens to give more control over the kinds of samples MuseNet generates. Generations will be more natural if you pick instruments closest to the composer or band’s usual style. MuseNet has a more difficult time with odd pairings of styles and instruments (such as Chopin with bass and drums).The model shifts to make your instrument choices more likely, but there's always a chance it will choose something else. MuseNet generates each note by calculating the probabilities across all possible notes and instruments. The instruments you ask for are strong suggestions, not requirements.The completions will take longer, but you'll be creating an entirely new piece. In advanced mode you can interact with the model directly. This lets you explore the variety of musical styles the model can create. ![]() Choose a composer or style, an optional start of a famous piece, and start generating. In simple mode (shown by default), you'll hear random uncurated samples that we've pre-generated. We’re excited to see how musicians and non-musicians alike will use MuseNet to create new compositions! The model manages to blend the two styles convincingly, with the full band joining in at around the 30 second mark: Here the model is given the first 6 notes of a Chopin Nocturne, but is asked to generate a piece in a pop style with piano, drums, bass, and guitar. Since MuseNet knows many different styles, we can blend generations in novel ways. MuseNet uses the same general-purpose unsupervised technology as GPT-2, a large-scale transformer model trained to predict the next token in a sequence, whether audio or text. MuseNet was not explicitly programmed with our understanding of music, but instead discovered patterns of harmony, rhythm, and style by learning to predict the next token in hundreds of thousands of MIDI files. We've created MuseNet, a deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |