わけあって Waveform な動画 (ffmpeg で)2

前回のニーズとは少しだけ違う「作為的動画」が欲しくなった。

前回とほぼノリは同じだけれど、今回欲しいのはこんな性質の動画:

  1. 絵と音が同期してる点については前回と同じ
  2. 加えて、部分カットしやすい(「編集点がわかりやすい」)
  3. なおかつ、時間経過が文字として記録されている

まず前回からほんの少し書き換えただけのバカスクリプト:

 1 # -*- coding: utf-8 -*-
 2 from os import path
 3 import sys
 4 import codecs
 5 import subprocess
 6 
 7 
 8 # =============================================
 9 ifn = sys.argv[1]
10 
11 #
12 voice = "Microsoft Server Speech Text to Speech Voice (en-US, ZiraPro)"
13 
14 #
15 fn = "tmp.ps1"
16 fo = codecs.getwriter('cp932')(open(fn, "w"))
17 # ---------------------------------------------
18 fo.write("""\
19 [Reflection.Assembly]::LoadWithPartialName("Microsoft.Speech")
20 $speak = New-Object Microsoft.Speech.Synthesis.SpeechSynthesizer
21 $speak.Rate = -3  # from -10 to 10, default is zero.
22 $speak.SetOutputToWaveFile("{}.mp3")
23 """.format(path.join(path.abspath("."), path.basename(ifn))))
24 # ---------------------------------------------
25 
26 # ---------------------------------------------
27 text = open(ifn, "rb").read().decode("cp932").strip()
28 fo.write(u"""\
29 $speak.SelectVoice("{}")
30 """.format(voice))
31 fo.write(u"""\
32 $speak.SpeakSsml("
33 <speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'>
34 """)
35 for s in text.split("\n"):
36     fo.write(u"""\
37 <s>{}</s><break time='2s'/>
38 """.format(s))
39 fo.write(u"""</speak>
40 ")
41 """)
42 # ---------------------------------------------
43 fo.write("""$speak.Dispose()
44 """)
45 fo.close()
46 # ---------------------------------------------
47 
48 # =============================================
49 subprocess.call([
50         "C:/WINDOWS/SysWOW64/WindowsPowerShell/v1.0/powershell",
51         path.abspath(fn)])
52 # =============================================

前回のは Speech Synthesis Markup Language の価値を発揮させてなかったんだけど、今回は入力ファイルのテキストの一行ごとに任意時間のお休みを入れるてことをしてる。というかこれ、SSML なしでやるとかなりしんどい(pause と resume だけでどうにかしなければならないからである)。

ちなみにしゃべらせてるのは以下テキスト:

 1 Phosphophyllite: Known as 'Phos' for short. One of the weaker jewel people, with a hardness of 3.5. Declared as too weak for battle, Phos is tasked with creating an encyclopedia logging new information. As Phos keeps experiencing hardships in battle, their limbs, along with their memories, are gradually lost and replaced with other elements, including agate legs, arms made from gold and platinum alloy, a lapis lazuli head and a synthetic pearl eye.
 2 Cinnabar: An aloof jewel person who is even weaker than Phos, with a hardness of 2, but carries a powerful poison in their body. Because this poison taints the environment and erases memories stored inside affected jewel shards, Cinnabar is kept on night watch, but yearns to escape the night.
 3 Diamond: A kind-hearted jewel person who has the maximum hardness of 10, but is still fragile against enemy attacks.
 4 Bort: An intimidating diamond-class jewel person who is powerful in combat and is the most durable of the jewels. They are also very protective of Diamond.
 5 Morganite: A haughty jewel person who is very confident in their fighting capabilities.
 6 Goshenite: A sweet and responsible jewel person. Morganite's partner.
 7 Rutile: The medic jewel person in charge of fixing the others jewels when they're broken, although they have a habit of wanting to dissect what catches their attention.
 8 Jade: A jewel person who works as secretary for Master Kongo. Euclase's partner.
 9 Red Beryl: A jewel person in charge of designing and fixing the outfits of the other jewels, changing their hairstyle frequently.
10 Amethyst: Crystal twinning.'84' and '33'. Twin jewels that are always together and act in synchrony while talking and fighting. They also excel in sword fighting.
11 Benitoite: A jewel person who is incapable of saying 'no' to people who ask for their help.
12 Neptunite: A young jewel person with a sharp tongue. Benitoite's partner.
13 Zircon: The second youngest jewel person after Phos, and Yellow Diamond's partner.
14 Obsidian: A jewel person in charge of manufacturing the tools, weapons and daily items for the other jewels.
15 Yellow Diamond: The oldest of the jewel people, and the fastest. Zircon's partner.
16 Euclase: One of the oldest gems, they're wise and kind. Jade's partner.
17 Alexandrite: A jewel person with an obsession with understanding the Lunarians, although she has a tendency to turn berserk when they see one. As such, Master Kongo has forbidden them from fighting.
18 Peridot: Papermaker, obsessed with their job.
19 Antarcticite: A jewel who only appears during the winter, when the other jewels undergo hibernation. They have become much sturdier thanks to the cold.
20 Sphene: Artisan, a sweet and calm jewel person. Peridot's partner.
21 Watermelon Tourmaline: Energetic younger jewel person.
22 Hemimorphite: Known as Hemimor, a fighter and Watermelon Tourmaline's partner.
23 Heliodor: A golden-colored jewel person, taken by the Lunarians prior the beginning of the story.
24 Padparadscha: Rutile's partner. A jewel person nearly as old and strong as the diamonds, who was born incomplete, hence spending most of their time asleep.
25 Master Kongo (Kongou-sensei): A powerful monk who oversees the jewel people and acts as a father figure to them. He is much sturdier than the jewels and when meditating or sleeping, he scarcely wakes up. His outer shell is composed of hexagonal diamond (lonsdaleite).

そしてこれで作った MP3 を、前回同様に waveform な動画に:

今回は mode=cline にしてみた
1 me@host: ~$ ffmpeg -y -i input.mp3 \
2 >  -filter_complex "[0:a]showwaves=s=960x540:mode=cline,format=yuv420p[v]" \
3 >  -map "[v]" -map 0:a -c:v libx264 -c:a aac \
4 >  output.mp4

今回はこれだけでは終わりではなくて、この動画にさらに時間経過の情報を埋め込みたいというわけだ。

overlay 出来るわけなので、時間経過を刻むだけの動画を作っておいて overlay すりゃぁよかろ、てことだがまずは「時間経過を刻むだけの動画作成バカスクリプト」(これを少し書き換えただけ):

 1 #! /bin/env python
 2 # -*- coding: utf-8 -*-
 3 import io
 4 import argparse
 5 
 6 from PIL import Image, ImageDraw, ImageFont
 7 import av  # https://github.com/mikeboers/PyAV
 8 
 9 
10 def _drawtextimage(width, height, txt1, fnt1, txt2, fnt2):
11     img = Image.new("RGB", (width, height), "#ffffff")
12     dctx = ImageDraw.Draw(img)
13 
14     txtsz = dctx.textsize(txt1, fnt1)
15     dctx.text(
16         (img.width / 2 - txtsz[0] / 2, img.height / 2 - txtsz[1] / 2),
17         txt1, font=fnt1, fill="#0000ff")
18 
19     txtsz = dctx.textsize(txt2, fnt2)
20     dctx.text(
21         (0, img.height - txtsz[1]),
22         txt2, font=fnt2, fill="#000000")
23 
24     del dctx  # destroy drawing context
25     return img
26 
27 
28 def _tovideo(args):
29     ocont = av.open(args.outfile, "w")
30     vrate = 12  # fps
31     vstream = ocont.add_stream('h264', rate=vrate)
32     vstream.width = 960
33     vstream.height = 540
34     vstream.pix_fmt = 'yuv420p'
35 
36     fnt1 = ImageFont.truetype('msmincho.ttc', 384)
37     fnt2 = ImageFont.truetype('msmincho.ttc', 128)
38 
39     for t in range(60 * 6 + 20):
40         tstr = "%02d:%02d" % (t // 60, t % 60)
41         for i in range(1, 13):
42             img = _drawtextimage(
43                 vstream.width, vstream.height, tstr, fnt1, "%2d" % i, fnt2)
44             vframe = av.VideoFrame.from_image(img)
45             for p in vstream.encode(vframe):
46                 ocont.mux(p)
47     try:
48         # flush the rest in queue.
49         for p in vstream.encode():
50             ocont.mux(p)
51     except av.AVError as e:  # End Of File
52         pass
53     ocont.close()  # MUST!
54 
55 
56 if __name__ == '__main__':
57     parser = argparse.ArgumentParser()
58     parser.add_argument("--outfile", default="time_video.mp4")
59     args = parser.parse_args()
60 
61     _tovideo(args)

あとは overlay すればよい。waveform な動画を i1.mp4、時間経過な動画を i2.mp4 として:

例によりシェルスクリプトなのはコマンドラインに書ききるのが大変だからてだけ
1 #! /bin/sh
2 
3 ffmpeg -y -i i1.mp4 -i i2.mp4 -filter_complex "
4 [1:v]scale=iw/4:-1[1v];
5 [0:v][1v]overlay=(W - w - 50):(H - h - 50)[v]" \
6   -map '[v]' -map '0:a' -ac 2 out.mp4