_parse_hl_lines 作り実況中継

ちょっとした実験をこの記事でやってみたい。

以下は、ワタシの脳内実況中継、である。


…「hl_lines で範囲による指定を許容したら凄く嬉しい」

…「hl_lines=”1-3, 5″とか」

…「hl_lines=”1-3, range(1, 10, 2)”とか」

…「入力は php でリストにしていたが、php では文字列のままにして、python でやった方が柔軟だ」

…「_parse_hl_linesが必要だな。」

…「まずは空っぽの _parse_hl_lines…」(ファイル名は zzz.py)

1 # -*- coding: utf-8 -*-
2 def _parse_hl_lines(fromform):
3     pass

…「渡したいものを渡しておく…」

1 # -*- coding: utf-8 -*-
2 def _parse_hl_lines(fromform):
3     pass
4 
5 _parse_hl_lines("")
6 _parse_hl_lines("32 36-38")
7 _parse_hl_lines("32  36 - 38")
8 _parse_hl_lines("range(5) range(10, 15) range(20, 30, 2) 32 36-38")
9 _parse_hl_lines("range(5), range(10, 15), range(20, 30, 2), 32, 36-38")

…「区切り文字を空白・コンマ両方許容してるのが仇になってるなぁ…」

 1 # -*- coding: utf-8 -*-
 2 import re
 3 
 4 
 5 def _parse_hl_lines(fromform):
 6     print(re.sub(r'\s+', ' ', re.sub(r"\s+-\s+", "-", fromform)))
 7 
 8 #_parse_hl_lines("")
 9 _parse_hl_lines("32 36-38")
10 _parse_hl_lines("32  36 - 38")
11 _parse_hl_lines("range(5) range(10, 15) range(20, 30, 2) 32 36-38")
12 _parse_hl_lines("range(5), range(10, 15), range(20, 30, 2), 32, 36-38")
1 me@host: ~$ python zzz.py
2 32 36-38
3 32 36-38
4 range(5) range(10, 15) range(20, 30, 2) 32 36-38
5 range(5), range(10, 15), range(20, 30, 2), 32, 36-38

…「よし、ハイフンの両端の空白は消えた…」

…「ranges 方式とそうでないのを分けて処理すればよいのよね。まずは range たちを引っこ抜くか…」

 1 # -*- coding: utf-8 -*-
 2 import re
 3 
 4 
 5 def _parse_hl_lines(fromform):
 6     not_ranges = re.sub(r'\s+', ' ', re.sub(r"\s+-\s+", "-", fromform))
 7     ranges = re.findall(r"\b(range\([^()]+\))", not_ranges)
 8     if ranges:
 9         print(ranges)
10 
11 
12 _parse_hl_lines("")
13 _parse_hl_lines("32 36-38")
14 _parse_hl_lines("32  36 - 38")
15 _parse_hl_lines("range(5) range(10, 15) range(20, 30, 2) 32 36-38")
16 _parse_hl_lines("range(5), range(10, 15), range(20, 30, 2), 32, 36-38")
1 me@host: ~$ python zzz.py
2 ['range(5)', 'range(10, 15)', 'range(20, 30, 2)']
3 ['range(5)', 'range(10, 15)', 'range(20, 30, 2)']

…「拾った range たちを、「以外」を引っこ抜くための正規表現にしたいわけだから…」

1 >>> ranges = ['range(5)', 'range(10, 15)', 'range(20, 30, 2)']
2 >>> "|".join(ranges)
3 'range(5)|range(10, 15)|range(20, 30, 2)'

…「うぅ、正規表現だから括弧をエスケープせんといかんか…」

1 >>> import re
2 >>> re.sub(r"([()])", r"\\\1", "|".join(ranges))
3 'range\\(5\\)|range\\(10, 15\\)|range\\(20, 30, 2\\)'
4 >>> "(%s)" % re.sub(r"([()])", r"\\\1", "|".join(ranges))
5 '(range\\(5\\)|range\\(10, 15\\)|range\\(20, 30, 2\\))'

…「よし、これを元の入力文字列fromformから「not_ranges」引っこ抜けるな…」(スクリプトに戻って…)

 1 # -*- coding: utf-8 -*-
 2 import re
 3 
 4 
 5 def _parse_hl_lines(fromform):
 6     not_ranges = re.sub(r'\s+', ' ', re.sub(r"\s+-\s+", "-", fromform))
 7     ranges = re.findall(r"\b(range\([^()]+\))", not_ranges)
 8     if ranges:
 9         not_ranges = re.sub(
10             "(%s)" % re.sub(r"([()])", r"\\\1", "|".join(ranges)),
11             "", fromform)
12         print(not_ranges)
13 
14 
15 #_parse_hl_lines("")
16 _parse_hl_lines("32 36-38")
17 _parse_hl_lines("32  36 - 38")
18 _parse_hl_lines("range(5) range(10, 15) range(20, 30, 2) 32 36-38")
19 _parse_hl_lines("range(5), range(10, 15), range(20, 30, 2), 32, 36-38")
1 me@host: ~$ python zzz.py
2    32 36-38
3 , , , 32, 36-38

…「not_ranges はこれでいい。ranges は「range」な文字列と括弧はいらんな…」

 1 # -*- coding: utf-8 -*-
 2 import re
 3 
 4 
 5 def _parse_hl_lines(fromform):
 6     not_ranges = re.sub(r'\s+', ' ', re.sub(r"\s+-\s+", "-", fromform))
 7     ranges = re.findall(r"\b(range\([^()]+\))", not_ranges)
 8     if ranges:
 9         not_ranges = re.sub(
10             "(%s)" % re.sub(r"([()])", r"\\\1", "|".join(ranges)),
11             "", fromform)
12         ranges = [re.sub(r"\brange\(([^()]+)\)", r"\1", r) for r in ranges]
13         print(ranges)
14         print(not_ranges)
15 
16 
17 _parse_hl_lines("")
18 _parse_hl_lines("32 36-38")
19 _parse_hl_lines("32  36 - 38")
20 _parse_hl_lines("range(5) range(10, 15) range(20, 30, 2) 32 36-38")
21 _parse_hl_lines("range(5), range(10, 15), range(20, 30, 2), 32, 36-38")
1 me@host: ~$ python zzz.py
2 ['5', '10, 15', '20, 30, 2']
3    32 36-38
4 ['5', '10, 15', '20, 30, 2']
5 , , , 32, 36-38

…「ranges, not_ranges のループ本体だけ書いとくか…」

 1 # -*- coding: utf-8 -*-
 2 import re
 3 
 4 
 5 def _parse_hl_lines(fromform):
 6     not_ranges = re.sub(r'\s+', ' ', re.sub(r"\s+-\s+", "-", fromform))
 7     ranges = re.findall(r"\b(range\([^()]+\))", not_ranges)
 8     if ranges:
 9         not_ranges = re.sub(
10             "(%s)" % re.sub(r"([()])", r"\\\1", "|".join(ranges)),
11             "", fromform)
12         ranges = [re.sub(r"\brange\(([^()]+)\)", r"\1", r) for r in ranges]
13 
14     result = []
15     for s in re.split(r"[\s,]+", not_ranges):
16         if not s.strip():
17             continue
18         print(s)
19     for sl in ranges:
20         print(sl)
21     return result
22 
23 _parse_hl_lines("")
24 _parse_hl_lines("32 36-38")
25 _parse_hl_lines("32  36 - 38")
26 _parse_hl_lines("range(5) range(10, 15) range(20, 30, 2) 32 36-38")
27 _parse_hl_lines("range(5), range(10, 15), range(20, 30, 2), 32, 36-38")
 1 me@host: ~$ python zzz.py
 2 32
 3 36-38
 4 32
 5 36-38
 6 32
 7 36-38
 8 5
 9 10, 15
10 20, 30, 2
11 32
12 36-38
13 5
14 10, 15
15 20, 30, 2

…「こっからは doctest 書いた方が楽だな…」

 1 # -*- coding: utf-8 -*-
 2 import re
 3 
 4 
 5 def _parse_hl_lines(fromform):
 6     """
 7     >>> _parse_hl_lines("")
 8     []
 9     >>> _parse_hl_lines("32 36-38")
10     [32, 36, 37, 38]
11     >>> _parse_hl_lines("32  36 - 38")
12     [32, 36, 37, 38]
13     >>> _parse_hl_lines("range(5) range(10, 15) range(20, 30, 2) 32 36-38")
14     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
15     >>> _parse_hl_lines("range(5), range(10, 15), range(20, 30, 2), 32, 36-38")
16     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
17     """
18     not_ranges = re.sub(r'\s+', ' ', re.sub(r"\s+-\s+", "-", fromform))
19     ranges = re.findall(r"\b(range\([^()]+\))", not_ranges)
20     if ranges:
21         not_ranges = re.sub(
22             "(%s)" % re.sub(r"([()])", r"\\\1", "|".join(ranges)),
23             "", fromform)
24         ranges = [re.sub(r"\brange\(([^()]+)\)", r"\1", r) for r in ranges]
25 
26     result = []
27     for s in re.split(r"[\s,]+", not_ranges):
28         if not s.strip():
29             continue
30         print(s)
31     for sl in ranges:
32         print(sl)
33     return result
34 
35 
36 if __name__ == '__main__':
37     import doctest
38     doctest.testmod()
 1 me@host: ~$ python zzz.py
 2 **********************************************************************
 3 File "zzz.py", line 9, in __main__._parse_hl_lines
 4 Failed example:
 5     _parse_hl_lines("32 36-38")
 6 Expected:
 7     [32, 36, 37, 38]
 8 Got:
 9     32
10     36-38
11     []
12 **********************************************************************
13 File "zzz.py", line 11, in __main__._parse_hl_lines
14 Failed example:
15     _parse_hl_lines("32  36 - 38")
16 Expected:
17     [32, 36, 37, 38]
18 Got:
19     32
20     36-38
21     []
22 **********************************************************************
23 File "zzz.py", line 13, in __main__._parse_hl_lines
24 Failed example:
25     _parse_hl_lines("range(5) range(10, 15) range(20, 30, 2) 32 36-38")
26 Expected:
27     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
28 Got:
29     32
30     36-38
31     5
32     10, 15
33     20, 30, 2
34     []
35 **********************************************************************
36 File "zzz.py", line 15, in __main__._parse_hl_lines
37 Failed example:
38     _parse_hl_lines("range(5), range(10, 15), range(20, 30, 2), 32, 36-38")
39 Expected:
40     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
41 Got:
42     32
43     36-38
44     5
45     10, 15
46     20, 30, 2
47     []
48 **********************************************************************
49 1 items had failures:
50    4 of   5 in __main__._parse_hl_lines
51 ***Test Failed*** 4 failures.

…「あぁうるせ、print いらね…」

 1 # -*- coding: utf-8 -*-
 2 import re
 3 
 4 
 5 def _parse_hl_lines(fromform):
 6     """
 7     >>> _parse_hl_lines("")
 8     []
 9     >>> _parse_hl_lines("32 36-38")
10     [32, 36, 37, 38]
11     >>> _parse_hl_lines("32  36 - 38")
12     [32, 36, 37, 38]
13     >>> _parse_hl_lines("range(5) range(10, 15) range(20, 30, 2) 32 36-38")
14     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
15     >>> _parse_hl_lines("range(5), range(10, 15), range(20, 30, 2), 32, 36-38")
16     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
17     """
18     not_ranges = re.sub(r'\s+', ' ', re.sub(r"\s+-\s+", "-", fromform))
19     ranges = re.findall(r"\b(range\([^()]+\))", not_ranges)
20     if ranges:
21         not_ranges = re.sub(
22             "(%s)" % re.sub(r"([()])", r"\\\1", "|".join(ranges)),
23             "", fromform)
24         ranges = [re.sub(r"\brange\(([^()]+)\)", r"\1", r) for r in ranges]
25 
26     result = []
27     for s in re.split(r"[\s,]+", not_ranges):
28         if not s.strip():
29             continue
30     for sl in ranges:
31         pass
32     return result
33 
34 
35 if __name__ == '__main__':
36     import doctest
37     doctest.testmod()
 1 me@host: ~$ python zzz.py
 2 **********************************************************************
 3 File "zzz.py", line 9, in __main__._parse_hl_lines
 4 Failed example:
 5     _parse_hl_lines("32 36-38")
 6 Expected:
 7     [32, 36, 37, 38]
 8 Got:
 9     []
10 **********************************************************************
11 File "zzz.py", line 11, in __main__._parse_hl_lines
12 Failed example:
13     _parse_hl_lines("32  36 - 38")
14 Expected:
15     [32, 36, 37, 38]
16 Got:
17     []
18 **********************************************************************
19 File "zzz.py", line 13, in __main__._parse_hl_lines
20 Failed example:
21     _parse_hl_lines("range(5) range(10, 15) range(20, 30, 2) 32 36-38")
22 Expected:
23     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
24 Got:
25     []
26 **********************************************************************
27 File "zzz.py", line 15, in __main__._parse_hl_lines
28 Failed example:
29     _parse_hl_lines("range(5), range(10, 15), range(20, 30, 2), 32, 36-38")
30 Expected:
31     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
32 Got:
33     []
34 **********************************************************************
35 1 items had failures:
36    4 of   5 in __main__._parse_hl_lines
37 ***Test Failed*** 4 failures.

…「not_ranges は int にする、ハイフン含む方は range で実現出来る…」

 1 # -*- coding: utf-8 -*-
 2 import re
 3 
 4 
 5 def _parse_hl_lines(fromform):
 6     """
 7     >>> _parse_hl_lines("")
 8     []
 9     >>> _parse_hl_lines("32 36-38")
10     [32, 36, 37, 38]
11     >>> _parse_hl_lines("32  36 - 38")
12     [32, 36, 37, 38]
13     >>> _parse_hl_lines("range(5) range(10, 15) range(20, 30, 2) 32 36-38")
14     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
15     >>> _parse_hl_lines("range(5), range(10, 15), range(20, 30, 2), 32, 36-38")
16     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
17     """
18     not_ranges = re.sub(r'\s+', ' ', re.sub(r"\s+-\s+", "-", fromform))
19     ranges = re.findall(r"\b(range\([^()]+\))", not_ranges)
20     if ranges:
21         not_ranges = re.sub(
22             "(%s)" % re.sub(r"([()])", r"\\\1", "|".join(ranges)),
23             "", fromform)
24         ranges = [re.sub(r"\brange\(([^()]+)\)", r"\1", r) for r in ranges]
25 
26     result = []
27     for s in re.split(r"[\s,]+", not_ranges):
28         if not s.strip():
29             continue
30         if "-" in s:
31             spl = s.split("-")
32             result.extend(range(int(spl[0]), int(spl[1]) + 1))
33         else:
34             result.append(int(s))
35     for sl in ranges:
36         pass
37     return result
38 
39 
40 if __name__ == '__main__':
41     import doctest
42     doctest.testmod()
 1 me@host: ~$ python zzz.py
 2 **********************************************************************
 3 File "zzz.py", line 13, in __main__._parse_hl_lines
 4 Failed example:
 5     _parse_hl_lines("range(5) range(10, 15) range(20, 30, 2) 32 36-38")
 6 Expected:
 7     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
 8 Got:
 9     [32, 36, 37, 38]
10 **********************************************************************
11 File "zzz.py", line 15, in __main__._parse_hl_lines
12 Failed example:
13     _parse_hl_lines("range(5), range(10, 15), range(20, 30, 2), 32, 36-38")
14 Expected:
15     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
16 Got:
17     [32, 36, 37, 38]
18 **********************************************************************
19 1 items had failures:
20    2 of   5 in __main__._parse_hl_lines
21 ***Test Failed*** 2 failures.

…「ranges は range にそのまま渡せるはずだよな…」

 1 >>> range(*map(int, ["5"]))
 2 [0, 1, 2, 3, 4]
 3 >>> range(*map(int, ["10, 15"]))
 4 Traceback (most recent call last):
 5   File "<stdin>", line 1, in <module>
 6 ValueError: invalid literal for int() with base 10: '10, 15'
 7 >>> range(*map(int, ["10, 15"].split(",")))
 8 Traceback (most recent call last):
 9   File "<stdin>", line 1, in <module>
10 AttributeError: 'list' object has no attribute 'split'
11 >>> range(*map(int, [10, 15".split(",")))
12   File "<stdin>", line 1
13     range(*map(int, [10, 15".split(",")))
14                                    ^
15 SyntaxError: invalid syntax
16 >>> range(*map(int, "10, 15".split(",")))
17 [10, 11, 12, 13, 14]
18 >>> range(*map(int, "20, 30, 2".split(",")))
19 [20, 22, 24, 26, 28]

…「のでこうだな…」

 1 # -*- coding: utf-8 -*-
 2 import re
 3 
 4 
 5 def _parse_hl_lines(fromform):
 6     """
 7     >>> _parse_hl_lines("")
 8     []
 9     >>> _parse_hl_lines("32 36-38")
10     [32, 36, 37, 38]
11     >>> _parse_hl_lines("32  36 - 38")
12     [32, 36, 37, 38]
13     >>> _parse_hl_lines("range(5) range(10, 15) range(20, 30, 2) 32 36-38")
14     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
15     >>> _parse_hl_lines("range(5), range(10, 15), range(20, 30, 2), 32, 36-38")
16     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
17     """
18     not_ranges = re.sub(r'\s+', ' ', re.sub(r"\s+-\s+", "-", fromform))
19     ranges = re.findall(r"\b(range\([^()]+\))", not_ranges)
20     if ranges:
21         not_ranges = re.sub(
22             "(%s)" % re.sub(r"([()])", r"\\\1", "|".join(ranges)),
23             "", fromform)
24         ranges = [re.sub(r"\brange\(([^()]+)\)", r"\1", r) for r in ranges]
25 
26     result = []
27     for s in re.split(r"[\s,]+", not_ranges):
28         if not s.strip():
29             continue
30         if "-" in s:
31             spl = s.split("-")
32             result.extend(range(int(spl[0]), int(spl[1]) + 1))
33         else:
34             result.append(int(s))
35     for sl in ranges:
36         result.extend(range(*map(int, sl.split(","))))
37     return result
38 
39 
40 if __name__ == '__main__':
41     import doctest
42     doctest.testmod()
 1 me@host: ~$ python zzz.py
 2 **********************************************************************
 3 File "zzz.py", line 13, in __main__._parse_hl_lines
 4 Failed example:
 5     _parse_hl_lines("range(5) range(10, 15) range(20, 30, 2) 32 36-38")
 6 Expected:
 7     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
 8 Got:
 9     [32, 36, 37, 38, 0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28]
10 **********************************************************************
11 File "zzz.py", line 15, in __main__._parse_hl_lines
12 Failed example:
13     _parse_hl_lines("range(5), range(10, 15), range(20, 30, 2), 32, 36-38")
14 Expected:
15     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
16 Got:
17     [32, 36, 37, 38, 0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28]
18 **********************************************************************
19 1 items had failures:
20    2 of   5 in __main__._parse_hl_lines
21 ***Test Failed*** 2 failures.

…「よし、あとは並べ替えればよさげだな…」

 1 # -*- coding: utf-8 -*-
 2 import re
 3 
 4 
 5 def _parse_hl_lines(fromform):
 6     """
 7     >>> _parse_hl_lines("")
 8     []
 9     >>> _parse_hl_lines("32 36-38")
10     [32, 36, 37, 38]
11     >>> _parse_hl_lines("32  36 - 38")
12     [32, 36, 37, 38]
13     >>> _parse_hl_lines("range(5) range(10, 15) range(20, 30, 2) 32 36-38")
14     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
15     >>> _parse_hl_lines("range(5), range(10, 15), range(20, 30, 2), 32, 36-38")
16     [0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 32, 36, 37, 38]
17     """
18     not_ranges = re.sub(r'\s+', ' ', re.sub(r"\s+-\s+", "-", fromform))
19     ranges = re.findall(r"\b(range\([^()]+\))", not_ranges)
20     if ranges:
21         not_ranges = re.sub(
22             "(%s)" % re.sub(r"([()])", r"\\\1", "|".join(ranges)),
23             "", fromform)
24         ranges = [re.sub(r"\brange\(([^()]+)\)", r"\1", r) for r in ranges]
25 
26     result = []
27     for s in re.split(r"[\s,]+", not_ranges):
28         if not s.strip():
29             continue
30         if "-" in s:
31             spl = s.split("-")
32             result.extend(range(int(spl[0]), int(spl[1]) + 1))
33         else:
34             result.append(int(s))
35     for sl in ranges:
36         result.extend(range(*map(int, sl.split(","))))
37     result = list(set(result))
38     result.sort()
39     return result
40 
41 
42 if __name__ == '__main__':
43     import doctest
44     doctest.testmod()
1 me@host: ~$ python zzz.py
2 me@host: ~$ 

脳内実況中継、おしまい。

さて、ここで質問です。

この記事、何分で読めた?

おそらく、5分、てとこだと思います。

この記事は30分くらいで書きました。

もう一つ。この「_parse_hl_lines」は、実際にはどのくらいの時間で書いたか、想像出来ます?

答えは20分くらい。

さてさて。「何が実験」でしたでしょうか。それは秘密です。ではあんまりなので…。

ソフトウェア開発者として、色んなスキルレベルの人たちに、まぁ出会ってきたのね。ずっと感じてたのはさ、この「リズム感」「スピード感」みたいな「感覚的なもの」の差がね、ビックリするほど格差があるんだ。「出来る人」と「出来ない人」には救いようもないほどの開きがあるのに、どうにも「出来ない人」側がこのことにこそ気付いていないことが、あまりに多い気がしてさ。例えば、さっきの「20分くらいで書いた」だけど、「スクリプトも書いて」「対話モード起動して」「doctest()も書いて」「doctest()実行して」という一連のサイクルを、一気呵成にやってしまっているわけだけれど、「出来ない人」サイドの人々がそのことに気付くこともなく、まさかこっちが「そこまでやってると思わない」。

これってのはさぁ、結局のところ、「出来る人を疑似体験する」ということが出来ないから、なのだと思うんだわね。そりゃそーだ、他人なんだから。

であるならば。「どうやったら疑似体験の場を多く作れるだろうか?」という問いに行き着く。Pygments での「コンソールセッションのハイライト」とか、「動画・画像で伝える」といったことにワタシが執着してるのは、そんな理由があるのだす。