【Unity】WAVファイルからAudioClipを動的に生成する(WAVEのファイル構造解析)

はじめに

今回はWavファイルからAudioClipを生成する方法について書きたいと思います。

https://docs.unity3d.com/ja/2018.4/ScriptReference/AudioClip.Create.html
https://docs.unity3d.com/ja/2018.4/ScriptReference/AudioClip.SetData.html

はじめに
利用させていただくWavファイル
インポートした際の挙動
wavファイルの構造
- factチャンク無し(通常)
- factチャンク有り(特殊)
WAVEファイルを読み取り、解析する
AudioClipを生成する
コード

利用させていただくWavファイル

今回は実験として以下のサイトから.wavファイルを利用させていただきました。
サイト名：otosozai.com　https://otosozai.com/

インポートした際の挙動

まず動的にAudioClipを生成する前に、みなさんご存じかと思いますがUnityに.wavをインポートするとAudioClipに自動で変換してくれます。

ただしStreamingAssetsフォルダ以下に配置すると、生でデータを配置するためAudioClipに変換は行われません。
https://docs.unity3d.com/ja/2022.1/Manual/StreamingAssets.html

他にも動的に.wavを生成した後、再生をしたいとか等の場合に今回紹介する手法が活用できるはずです。

wavファイルの構造

WAVEファイルはRIFFという形式で保存されているらしく、以下の構造をしています。
http://soundfile.sapp.org/doc/WaveFormat/
http://www13.plala.or.jp/kymats/study/MULTIMEDIA/load_wave.html
http://shopping2.gmobb.jp/htdmnr/www08/asp/wav.html
http://www.graffiti.jp/pc/p030506a.htm

基本系は一番上の記事の形みたいですが、色々変化系があるっぽいですね。またRIFF,WAVE,fmt, fact,dataといった文字列が識別のためにいくつかあるのですがそれらはビッグエンディアンで、他のデータはリトルエンディアンのようです。

Microsoft WAVE soundfile format

factチャンク無し(通常)

開始アドレス	byte	データ内容
0	4	"RIFF"
4	4	ファイルのバイトサイズ - 8byte
8	4	"WAVE"
12	4	"fmt"
16	4	fmtチャンクのバイト数。リニアPCMなら`16`
20	2	フォーマットID。リニアPCMなら`1`
22	2	チャンネル数。モノラルなら`1`, ステレオなら`2`
24	4	サンプリングレート
28	4	データ速度(byte/sec)
32	2	ブロックサイズ(byte/sample*チャンネル数)
34	2	サンプルあたりのビット数(bit/sample)
(36)	2	拡張部分のサイズ。リニアPCMなら存在しない
(38)	n	拡張部分。リニアPCMなら存在しない
36	4	"data"
40	4	波形データのバイト数
44	n	波形データ

factチャンク有り(特殊)

開始アドレス	byte	データ内容
0	4	"RIFF"
4	4	ファイルのバイトサイズ - 8byte
8	4	"WAVE"
12	4	"fmt"
16	4	fmtチャンクのバイト数。リニアPCMなら`16`
20	2	フォーマットID。リニアPCMなら`1`
22	2	チャンネル数。モノラルなら`1`, ステレオなら`2`
24	4	サンプリングレート
28	4	データ速度(byte/sec)
32	2	ブロックサイズ(byte/sample*チャンネル数)
34	2	サンプルあたりのビット数(bit/sample)
(36)	2	拡張部分のサイズ。リニアPCMなら存在しない
(38)	n	拡張部分。リニアPCMなら存在しない
36	4	"fact"
40	4	factチャンクのバイト数
44	4	全サンプル数
48	4	"data"
52	4	波形データのバイト数
56	n	波形データ

ぶっちゃけfactチャンク内の情報はあんまり重要じゃないぽい？。

WAVEファイルを読み取り、解析する

WAVEファイルの読み取りは各自好きにしてもらえればですが、サンプルとしてStreamingsAssetsからの読み取ってみます。

// Andriodの場合はUnityWebRequestを使うことに注意
var fileBytes = File.ReadAllBytes(Path.Combine(Application.streamingAssetsPath, "se_sac08.wav"));

後はWaveファイルを解析してみます。

using var memoryStream = new MemoryStream(fileBytes);

// RIFF
var riffBytes = new byte[4];
memoryStream.Read(riffBytes);
if (riffBytes[0] != 0x52 || riffBytes[1] != 0x49 || riffBytes[2] != 0x46 || riffBytes[3] != 0x46) throw new ArgumentException("fileBytes is not the correct Wav file format.");
        
// chunk size
var chunkSizeBytes = new byte[4];
memoryStream.Read(chunkSizeBytes);
var chunkSize = BitConverter.ToInt32(chunkSizeBytes);
        
// WAVE
var wavBytes = new byte[4];
memoryStream.Read(wavBytes);
if (wavBytes[0] != 0x57 || wavBytes[1] != 0x41 || wavBytes[2] != 0x56 || wavBytes[3] != 0x45) throw new ArgumentException("fileBytes is not the correct Wav file format.");
        
// fmt
var fmtBytes = new byte[4];
memoryStream.Read(fmtBytes);
if (fmtBytes[0] != 0x66 || fmtBytes[1] != 0x6d || fmtBytes[2] != 0x74 || fmtBytes[3] != 0x20) throw new ArgumentException("fileBytes is not the correct Wav file format.");

// fmtSize
var fmtSizeBytes = new byte[4];
memoryStream.Read(fmtSizeBytes);
var fmtSize = BitConverter.ToInt32(fmtSizeBytes);
        
// AudioFormat
var audioFormatBytes = new byte[2];
memoryStream.Read(audioFormatBytes);
var isPCM = audioFormatBytes[0] == 0x1 && audioFormatBytes[1] == 0x0;
        
// NumChannels   Mono = 1, Stereo = 2
var numChannelsBytes = new byte[2];
memoryStream.Read(numChannelsBytes);
var channels = (int)BitConverter.ToUInt16(numChannelsBytes);
        
// SampleRate
var sampleRateBytes = new byte[4];
memoryStream.Read(sampleRateBytes);
var sampleRate = BitConverter.ToInt32(sampleRateBytes);
        
// ByteRate (=SampleRate * NumChannels * BitsPerSample/8)
var byteRateBytes = new byte[4];
memoryStream.Read(byteRateBytes);
        
// BlockAlign (=NumChannels * BitsPerSample/8)
var blockAlignBytes = new byte[2];
memoryStream.Read(blockAlignBytes);
        
// BitsPerSample
var bitsPerSampleBytes = new byte[2];
memoryStream.Read(bitsPerSampleBytes);
var bitPerSample = BitConverter.ToUInt16(bitsPerSampleBytes);

// Discard Extra Parameters
if(fmtSize > 16) memoryStream.Seek(fmtSize - 16, SeekOrigin.Current);
        
// Data
var subChunkIDBytes = new byte[4];
memoryStream.Read(subChunkIDBytes);

// If fact exists, discard fact
if (subChunkIDBytes[0] == 0x66 && subChunkIDBytes[1] == 0x61 && subChunkIDBytes[2] == 0x63 && subChunkIDBytes[3] == 0x74)
{
    var factSizeBytes = new byte[4];
    memoryStream.Read(factSizeBytes);
    var factSize = BitConverter.ToInt32(factSizeBytes);
    memoryStream.Seek(factSize, SeekOrigin.Current);
    memoryStream.Read(subChunkIDBytes);
}
if (subChunkIDBytes[0] != 0x64 || subChunkIDBytes[1] != 0x61 || subChunkIDBytes[2] != 0x74 || subChunkIDBytes[3] != 0x61) throw new ArgumentException("fileBytes is not the correct Wav file format.");

// dataSize (=NumSamples * NumChannels * BitsPerSample/8)
var dataSizeBytes = new byte[4];
memoryStream.Read(dataSizeBytes);
var dataSize = BitConverter.ToInt32(dataSizeBytes);

var data = new byte[dataSize];
memoryStream.Read(data);

MemoryStreamを利用せずにBitConverterの引数で配列のインデックスを指定，もしくは該当箇所のSpanを突っ込んであげた方が効率が良い？かもしれませんが、一応動作はします。

というかファイルから直接読み取るならFileStreamを使った方がメモリに直接結果を入れられるのでそれが一番だとは思います。気になるかたはそこだけちょいと直してみてください。

ただfmtのExtraParamsの有無やfactの有無の対処が必要なことには注意です。

AudioClipを生成する

後はAudioClipを作成するためにデータの変換を少し行います。

private static AudioClip CreateAudioClip(byte[] data, int channels, int sampleRate, UInt16 bitPerSample, string audioClipName)
{
    var audioClipData = bitPerSample switch
    {
        8 => Create8BITAudioClipData(data),
        16 => Create16BITAudioClipData(data),
        32 => Create32BITAudioClipData(data),
        _ => throw new ArgumentException($"bitPerSample is not supported : bitPerSample = {bitPerSample}")
    };

    var audioClip = AudioClip.Create(audioClipName, audioClipData.Length, channels, sampleRate, false);
    audioClip.SetData(audioClipData, 0);
    return audioClip;
}

private static float[] Create8BITAudioClipData(byte[] data)
    => data.Select((x, i) => (float) data[i] / sbyte.MaxValue).ToArray();

private static float[] Create16BITAudioClipData(byte[] data)
{
    var audioClipData = new float[data.Length / 2];
    var memoryStream = new MemoryStream(data);

    for(var i = 0;;i++)
    {
        var target = new byte[2];
        var read = memoryStream.Read(target);

        if (read <= 0) break;

        audioClipData[i] = (float) BitConverter.ToInt16(target) / short.MaxValue;
    }

    return audioClipData;
}

private static float[] Create32BITAudioClipData(byte[] data)
{
    var audioClipData = new float[data.Length / 4];
    var memoryStream = new MemoryStream(data);

    for(var i = 0;;i++)
    {
        var target = new byte[4];
        var read = memoryStream.Read(target);

        if (read <= 0) break;

        audioClipData[i] = (float) BitConverter.ToInt32(target) / int.MaxValue;
    }

    return audioClipData;
}