Forskningsprojekt Artificiell Intelligens som samarbetspartner i musik fördjupar sig i korsningen mellan AI och musikskapande. Göran Wretling är multiinstrumentalist, kompositör och producent och undersöker möjligheterna och utmaningarna som uppstår när människa och AI samarbetar inom musikkomposition och musikproduktion.
Tidigare forskning visar att även om AI-verktyg erbjuder nya möjligheter för kreativt uttryck, medför de även utmaningar såsom kulturell bias och en potentiell homogenisering av verk på grund av eurocentrisk träningsdata.
Research questions
The research topic is music composition and music production through human-AI interaction. The project has aimed to explore opportunities and outcomes of collaboration between a human composer and AI tools for musical composition and music production, guided by following research questions:
Experiment 1 – Tradition meets AI technology
My initial experimental phase focused on the encounter between musical tradition and AI technology. I used the generative folk music tool Folk RNN as a foundation and developed musical works therefrom. These works were then forwarded to the musician and national fiddler Thomas von Wachenfeldt for interpretation according to his genre-specialized focus. Thomas made recordings based on his own interpretations in relation to the AI-collaborative folk music material.
Experiment 2A & 2B – Code, music and catalyst effect
My second experimental phase explored the interplay between musical idea, perceived competence, and musical outcome as tools for composition and production. Based on the idea of a specific musical sound, I chose to explore and implement ChatGPT as a collaborative partner for coding and programming Web Audio API-based music sequencer tool (SVP Studio). In the process, a clear catalyst effect of AI integration emerged. This occurred when I, as a composer with comprehensive experience in the musical area but with significantly less experience in programming, was given the opportunity to engage in collaborative coding work with ChatGPT as a catalyst for my creative process and coding skills, which resulted in a music application comprising 70,000 characters of code.
The next step in this field was to continue the AI-collaborative coding and construction of a digital improvisation tool (Soundscaper) that is partly based on randomization. The system has various adjustable randomization/register functions and is based on approximately 40,000 characters. The tool is applicable regardless of instrumentation/number of musicians and was created as a follow-up to the AI-integrated improvisation work.
Experiment 3A & 3B – Iterative process
Experiment phase three was initially based on self-recorded vocal material that was used both as raw material for cloning my voice via ElevenLabs’ deep cloning service and as auditory input in the AI-based tool MusicGen. Thereafter, the musical output from MusicGen was used to compose a sound collage and this musical material then became input for the generative tool Suno, resulting in musical output as new compositional inspiration. The process highlights both the opportunities and challenges with an AI-integrated iterative musical creative process. I also used the output created in experiment phase 2 as new input for subsequent AI-integrated iteration chains for continued composition and production work. The experiment challenges the boundaries of what can be considered original and which processes may be regarded as musical creation.
Experiment 4A & 4B – AI and voice
My fourth experimental phase took its starting point in AI-based singing tools and their potential applications within composition and music production. I explored and implemented various genre-varying voice packages via the singing-based software instrument Synthesizer V. These clearly and concretely open up alternative computer-based composition and production possibilities that in no way replace live, real-time musicianship or interaction with other musicians, but rather serve as musical sketch tools. During this SVP period, I have also explored counterparts to Synthesizer V related to the classical music tradition where similar tools have not been available. The search led, in September 2024, to the composer and AI developer Richard DeCosta, who had initiated the development of the new AI singing tool Cantai. The tool’s development had then, and has not yet, been released publicly except for extensive product marketing. I therefore became one of the first alpha testers in the world, having continuous contact with the developer as a composer and product tester, and thereby gaining the opportunity to influence the product development process and implement the tool in my own music composing and music production.
Experiment 5 – Improvisation and composition with AI collaboration
My fifth experimental phase concerned the area of music composition as music improvisation and AI, where there seemingly are few available tools for composition with direct links to improvisation. Therefore, I contacted the composer and Max/MSP specialist Mike Lukaszuk, who has developed an AI-integrated music improvisation tool adaptable both for ensemble and solo improvisation. I chose to set the framework in Experiment 5 to a real-time improvised piece in three movements, where movement 1 (solo) includes keyboard and AI collaboration, movement 2 (duo) includes a keyboard, Moog synthesizer, and AI collaboration, and movement 3 (trio) includes keyboard, Moog synthesizer, electric bass, and AI collaboration. Inspiration for composing this improvisational piece was also drawn from the musical output of the previously conducted Experiment 1.
The AI performance method implemented in this project is based on the following approach: A custom computer-music system was developed in Max/MSP, using the ml.star package for machine learning. The bass guitar signal was analyzed for fundamental frequency and loudness, converting it to MIDI note and velocity data for neural network training. A self-organizing map (SOM) clustered similar pitches and velocities, mapping chord choices in a 2D space to predict note successions and repetition probabilities. A spatial encoder ranked notes by recency and frequency, enhancing dynamic sequencing. During our improvisations, real-time audio/MIDI analysis trained neural networks, generating new MIDI output played through different VSTi plugins. To balance the high-tech machine learning approach, analog synthesizers were used in live improvisation.
Experiment 6 – Solo Improvisation and composition with AI collaboration
As a follow-up to the fifth experimental phase, I continued working with Mike Lukaszuk to further develop opportunities in the field of composition and improvisation from a solo instrumentalist perspective. The AI-integrated system was further developed to enable improvisational music-making in symbiosis with the MPE Polyphonic Synthesizer Osmose Expressive E. Further development also contributed with possibilities to control both software and hardware-based musical parameters via various foot pedals connected to the keyboard. This enabled me as a solo musician, through pedal implementation, to start and stop machine reading in real time as well as activate/deactivate the machine-read musical material as an AI-collaborative musician in my own performance. All machine learning material in this music AI system development was entirely based on one’s own musical material.