Words can generate music! What is the difference between the AI tool AudioCraft released by Meta?

  Cailian News August 3 (Editor Niu Zhanlin) On Wednesday, Meta released an open source artificial intelligence (AI) tool, AudioCraft, which can help users create music and audio according to text prompts.

  (Source: Meta official website)

  Meta said that this artificial intelligence tool combines three models or technologies, namely AudioGen, EnCodec and MusicGen, and can generate high-quality and realistic audio and music with text content.

  Meta said in official website that MusicGen has received music training owned and specially authorized by Meta, and can generate music from text prompts, while AudioGen has received public sound effect training, and can generate audio from text prompts, such as simulating dog barking or footsteps; Coupled with the improved version of EnCodec codec, users can generate higher quality music more efficiently.

  In early June, Meta launched an open source artificial intelligence model called MusicGen, which is a deep learning language model that can generate music according to text prompts.

  Meta's EnCodec is an audio codec based on deep learning, driven by artificial intelligence, which can compress audio to 10 times smaller than MP3 format without losing audio quality.

  AudioGen is an artificial intelligence model of a research team from Meta and Hebrew University in Jerusalem, which can generate audio by inputting text or expand existing audio. AudioGen can distinguish different sound objects and separate them acoustically.

  Meta also demonstrated the flow chart of MusicGen and AudioGen, and said that these models will be open source, so that researchers and practitioners can train their own models with their own data sets, and help promote the development of artificial intelligence to generate audio and music.

  Compared with other music models, AudioCraft series models can generate long-term consistent high-quality music and audio, and also simplify the overall design of the audio generation model, making the tool simple and easy to use.

  Meta believes that its model can lead a new wave of songs, just as a synthesizer changes music. "We think MusicGen can be turned into a new kind of musical instrument, just like the first synthesizer."

  Of course, Meta also admitted that it is still difficult to create complex and excellent music, so it chose to open AudioCraft to train its data diversification.

  Earlier this year, Google also released a music generation model called MusicLM, which was opened to all users in May and last month. In addition, there are Riffusion, Mousai and Noise2Music which are common music models at present.