英文视频字幕自动生成

2022-10-23 15:29:23 交互设计 ℃

　　笔者正在制作少儿编程教育系列视频，发现有大量的英文视频资料值得学习，但是视频中缺少字幕，可能会对学生的学习过程带来困扰。如果能够得到英文字幕，再通过谷歌翻译等工具的使用，就可以进一步生成中文字幕。因此，开始探索快速生成字幕的方法，本文对实现过程进行记录，笔者的计算机使用的是Windows 10 64位操作系统。

　　注：需要计算机通过某些方法成功访问谷歌！

　　整体流程可划分为：

　　安装Python2下载配置ffmepg下载并修改autosub运行命令生成视频字幕下面依次进行展开：

　　安装Python2这里需要安装Python2，因为后续要调用autosub，而autosub是用Python2编写的。笔者试过使用Python3，可能会需要大量的改动才能运行，最终还是按照autosub的说明安装了Python2。这里推荐下载Anaconda2进行Python2的安装，可以省去可能存在的关联包下载配置的麻烦。此外，autosub中提示安装32位Python，笔者未测试64位Python是否可行。打开下载的Anaconda2（32位）会出现：

　　点击“Next”：

　　选择“I Agree”：

　　笔者习惯选择“All Users”，点击“Next”：

　　这里推荐选择一个不含空格和中文的文件夹进行安装，比如“D:\Anaconda2”：

　　不建议勾选第一个选项，可能造成python不同版本间使用的混乱，比如笔者还安装了Python3。第二项默认勾选即可：

　　点击“Next”，安装完毕：

　　下载配置ffmepgffmepg主要用于解析视频内的音频。

　　（1）在网站“https://ffmpeg.zeranoe.com/builds/”中，下载满足操作系统要求的ffmepg版本，笔者下载的是ffmpeg-3.2-win64-static。

　　（2）将下载后的文件解压，将文件夹重命名为ffmepg（可选），将其整体拷贝到“D:\Anaconda2”内（拷贝到哪个目录可以灵活掌握）。

　　（3）将解压后文件内的“bin”目录配置到系统环境变量Path中。

　　首先按下Win+R键，启动运行窗口，输入sysdm.cpl：

　　点击确定，打开系统控制面板，然后选择高级标签：

　　点击环境变量，在系统变量的窗口内，找到Path：

　　选中Path环境变量，点击编辑。选择新建，将刚才解压得到的“bin”文件夹所在目录添加到空白处：

　　点击确定。到此完成ffmepg的下载和配置。

　　下载并修改autosubautosub是用于自动生成字幕的工具，在语音转写部分调用的是Google Cloud Speech API。

　　（1）使用Anaconda2安装autosub

　　安装Anaconda2后，可以找到工具Anaconda Powershell Prompt (Anaconda2)，打开后，输入：

　　pip install autosub就可以完成autosub的安装。

　　（2）重命名autosub

　　autosub安装完成后文件位于“D:\Anaconda2\Scripts”内，将其重命名为“autosub_app.py”。

　　（3）修改autosub_app.py代码

　　这里对几处重点的修改展开说明，autosub_app.py的全部代码会在文末给出。

　　代码第48行，加入“, delete=False”，使临时文件不被删除，也就是将：temp = tempfile.NamedTemporaryFile(suffix='.flac')修改为：

　　temp = tempfile.NamedTemporaryFile(suffix='.flac', delete=False)代码第127行，加入“.exe”，以保证成功地访问到ffmepg.exe文件，也就是将：exe_file = os.path.join(path, program)修改为：

　　exe_file = os.path.join(path, program + ".exe")加入proxy信息在引入依赖包后，添加全局proxy_dict，这里只是定义一个字典结构：

　　proxy_dict = { 'http': 'http://127.0.0.1:8118', 'https': 'https://127.0.0.1:8118', 'use': False}然后修改类SpeechRecognizer，在__init__方法中加入proxy变量，在__call__方法中添加逻辑，根据命令判断是否使用proxy，发出不同的post请求。

　　此外，建议在抛出requests.exceptions.ConnectionError后加入一条打印提示，否则遇到连接Google服务器异常的情况，也不会提示任何错误，而程序最终会获得一个大小为0的srt字幕文件：

　　except requests.exceptions.ConnectionError: print "ConnectionError\n" continue类SpeechRecognizer修改后如下：

　　class SpeechRecognizer(object): def __init__(self, language="en", rate=44100, retries=3, api_key=GOOGLE_SPEECH_API_KEY, proxy=proxy_dict): self.language = language self.rate = rate self.api_key = api_key self.retries = retries self.proxy = proxy def __call__(self, data): try: for i in range(self.retries): url = GOOGLE_SPEECH_API_URL.format(lang=self.language, key=self.api_key) headers = {"Content-Type": "audio/x-flac; rate=%d" % self.rate} try: if self.proxy['use']: resp = requests.post(url, data=data, headers=headers, proxies=self.proxy) else: resp = requests.post(url, data=data, headers=headers) except requests.exceptions.ConnectionError: print "ConnectionError\n" continue for line in resp.content.split("\n"): try: line = json.loads(line) line = line['result'][0]['alternative'][0]['transcript'] return line[:1].upper() + line[1:] except: # no result continue except KeyboardInterrupt: return在main方法内加入proxy参数解析代码，这样就可以通过命令行参数来设置proxy：

　　parser.add_argument('-P', '--proxy', help="Set proxy server")args = parser.parse_args()if args.proxy: proxy_dict.update({ 'http': args.proxy, 'https': args.proxy, 'use': True})print("Use proxy " + args.proxy)到此，就完成了代码的配置过程，下面就可以通过命令行，运行程序进行字幕生成了。

　　运行命令生成视频字幕（1）代理配置信息获取（如果使用国外网络，此步可忽略）

　　首先需要找到计算机代理的配置信息。在win10下，右键点击桌面右下角的网络，然后打开“网络和Internet”设置，点击左侧最下方的代理：

　　将自动设置代理下的脚本地址，复制粘贴到浏览器地址栏内打开，拉到最下方部分，找到：

　　var proxy = "PROXY 127.0.0.1:8118; DIRECT;";var direct = 'DIRECT;';这样就能找到proxy的ip和端口设置，即127.0.0.1:8118。（根据不同工具的使用，这里的ip和端口可能会不同。）

　　（2）字幕提取命令

　　这里需要再次打开工具Anaconda Powershell Prompt (Anaconda2)，将工作目录切换至包含待提取字幕视频的目录内，例如D盘根目录下有一个待提取字幕的视频“01_HowComputersWork_sm.mp4”，首先将工作目录切换至D盘，然后执行命令：

　　python D:\Anaconda2\Scripts\autosub_app.py -S en -D en -P http://127.0.0.1:8118 .\01_HowComputersWork_sm.mp4如果使用国外网络，即可不配置-P及后面的参数，命令为：

　　python D:\Anaconda2\Scripts\autosub_app.py -S en -D en .\01_HowComputersWork_sm.mp4运行结果如下：

　　程序运行最后可能会报WindowsError，笔者还没有找到解决方案，但是这并不影响程序的功能，字幕已经成功生成，可以在D盘根目录下看到“01_HowComputersWork_sm.srt”文件，打开视频导入字幕效果如下：

　　当然，自动生成的字幕有待进一步审核校验。

　　autosub_app.py代码：

　　#!D:\Anaconda2\python.exeimport argparseimport audioopfrom googleapiclient.discovery import buildimport jsonimport mathimport multiprocessingimport osimport requestsimport subprocessimport sysimport tempfileimport wavefrom progressbar import ProgressBar, Percentage, Bar, ETAfrom autosub.constants import LANGUAGE_CODES, \ GOOGLE_SPEECH_API_KEY, GOOGLE_SPEECH_API_URLfrom autosub.formatters import FORMATTERSproxy_dict = { 'http': 'http://127.0.0.1:8118', 'https': 'https://127.0.0.1:8118', 'use': False}def percentile(arr, percent): arr = sorted(arr) k = (len(arr) - 1) * percent f = math.floor(k) c = math.ceil(k) if f == c: return arr[int(k)] d0 = arr[int(f)] * (c - k) d1 = arr[int(c)] * (k - f) return d0 + d1def is_same_language(lang1, lang2): return lang1.split("-")[0] == lang2.split("-")[0]class FLACConverter(object): def __init__(self, source_path, include_before=0.25, include_after=0.25): self.source_path = source_path self.include_before = include_before self.include_after = include_after def __call__(self, region): try: start, = region start = max(0, start - self.include_before) += self.include_after temp = tempfile.NamedTemporaryFile(suffix='.flac', delete = False) command = ["ffmpeg","-ss", str(start), "-t", str( - start), "-y", "-i", self.source_path, "-loglevel", "error", temp.name] subprocess.check_output(command, stdin=open(os.devnull)) return temp.read() except KeyboardInterrupt: returnclass SpeechRecognizer(object): def __init__(self, language="en", rate=44100, retries=3, api_key=GOOGLE_SPEECH_API_KEY, proxy=proxy_dict): self.language = language self.rate = rate self.api_key = api_key self.retries = retries self.proxy = proxy def __call__(self, data): try: for i in range(self.retries): url = GOOGLE_SPEECH_API_URL.format(lang=self.language, key=self.api_key) headers = {"Content-Type": "audio/x-flac; rate=%d" % self.rate} try: if self.proxy['use']: resp = requests.post(url, data=data, headers=headers, proxies=self.proxy) else: resp = requests.post(url, data=data, headers=headers) except requests.exceptions.ConnectionError: print "ConnectionError\n" continue for line in resp.content.split("\n"): try: line = json.loads(line) line = line['result'][0]['alternative'][0]['transcript'] return line[:1].upper() + line[1:] except: # no result continue except KeyboardInterrupt: returnclass Translator(object): def __init__(self, language, api_key, src, dst): self.language = language self.api_key = api_key self.service = build('translate', 'v2', developerKey=self.api_key) self.src = src self.dst = dst def __call__(self, sentence): try: if not sentence: return result = self.service.translations().list( source=self.src, target=self.dst, q=[sentence] ).execute() if 'translations' in result and len(result['translations']) and \ 'translatedText' in result['translations'][0]: return result['translations'][0]['translatedText'] return "" except KeyboardInterrupt: returndef which(program): def is_exe(fpath): return os.path.isfile(fpath) and os.access(fpath, os.X_OK) fpath, fname = os.path.split(program) if fpath: if is_exe(program): return program else: for path in os.environ["PATH"].split(os.pathsep): path = path.strip('"') exe_file = os.path.join(path, program + ".exe") if is_exe(exe_file): return exe_file return Nonedef extract_audio(filename, channels=1, rate=16000): temp = tempfile.NamedTemporaryFile(suffix='.wav', delete=False) if not os.path.isfile(filename): print "The given file does not exist: {0}".format(filename) raise Exception("Invalid filepath: {0}".format(filename)) if not which("ffmpeg"): print "ffmpeg: Executable not found on machine." raise Exception("Depency not found: ffmpeg") command = ["ffmpeg", "-y", "-i", filename, "-ac", str(channels), "-ar", str(rate), "-loglevel", "error", temp.name] subprocess.check_output(command, stdin=open(os.devnull)) return temp.name, ratedef find_speech_regions(filename, frame_width=4096, min_region_size=0.5, max_region_size=6): reader = wave.open(filename) sample_width = reader.getsampwidth() rate = reader.getframerate() n_channels = reader.getnchannels() total_duration = reader.getnframes() / rate chunk_duration = float(frame_width) / rate n_chunks = int(total_duration / chunk_duration) energies = [] for i in range(n_chunks): chunk = reader.readframes(frame_width) energies.app(audioop.rms(chunk, sample_width * n_channels)) threshold = percentile(energies, 0.2) elapsed_time = 0 regions = [] region_start = None for energy in energies: is_silence = energy <= threshold max_exceeded = region_start and elapsed_time - region_start >= max_region_size if (max_exceeded or is_silence) and region_start: if elapsed_time - region_start >= min_region_size: regions.app((region_start, elapsed_time)) region_start = None elif (not region_start) and (not is_silence): region_start = elapsed_time elapsed_time += chunk_duration return regionsdef main(): parser = argparse.ArgumentParser() parser.add_argument('source_path', help="Path to the video or audio file to subtitle", nargs='?') parser.add_argument('-C', '--concurrency', help="Number of concurrent API requests to make", type=int, default=10) parser.add_argument('-o', '--output', help="Output path for subtitles (by default, subtitles are saved in \ the same directory and name as the source path)") parser.add_argument('-F', '--format', help="Destination subtitle format", default="srt") parser.add_argument('-S', '--src-language', help="Language spoken in source file", default="en") parser.add_argument('-D', '--dst-language', help="Desired language for the subtitles", default="en") parser.add_argument('-K', '--api-key', help="The Google Translate API key to be used. (Required for subtitle translation)") parser.add_argument('--list-formats', help="List all available subtitle formats", action='store_true') parser.add_argument('--list-languages', help="List all available source/destination languages", action='store_true') parser.add_argument('-P', '--proxy', help="Set proxy server") args = parser.parse_args() if args.proxy: proxy_dict.update({ 'http': args.proxy, 'https': args.proxy, 'use': True }) print("Use proxy " + args.proxy) if args.list_formats: print("List of formats:") for subtitle_format in FORMATTERS.keys(): print("{format}".format(format=subtitle_format)) return 0 if args.list_languages: print("List of all languages:") for code, language in sorted(LANGUAGE_CODES.items()): print("{code}\t{language}".format(code=code, language=language)) return 0 if args.format not in FORMATTERS.keys(): print("Subtitle format not supported. Run with --list-formats to see all supported formats.") return 1 if args.src_language not in LANGUAGE_CODES.keys(): print("Source language not supported. Run with --list-languages to see all supported languages.") return 1 if args.dst_language not in LANGUAGE_CODES.keys(): print( "Destination language not supported. Run with --list-languages to see all supported languages.") return 1 if not args.source_path: print("Error: You need to specify a source path.") return 1 audio_filename, audio_rate = extract_audio(args.source_path) regions = find_speech_regions(audio_filename) pool = multiprocessing.Pool(args.concurrency) converter = FLACConverter(source_path=audio_filename) recognizer = SpeechRecognizer(language=args.src_language, rate=audio_rate, api_key=GOOGLE_SPEECH_API_KEY, proxy=proxy_dict) transcripts = [] if regions: try: widgets = ["Converting speech regions to FLAC files: ", Percentage(), ' ', Bar(), ' ', ETA()] pbar = ProgressBar(widgets=widgets, maxval=len(regions)).start() extracted_regions = [] for i, extracted_region in enumerate(pool.imap(converter, regions)): extracted_regions.app(extracted_region) pbar.update(i) pbar.finish() widgets = ["Performing speech recognition: ", Percentage(), ' ', Bar(), ' ', ETA()] pbar = ProgressBar(widgets=widgets, maxval=len(regions)).start() for i, transcript in enumerate(pool.imap(recognizer, extracted_regions)): transcripts.app(transcript) pbar.update(i) pbar.finish() if not is_same_language(args.src_language, args.dst_language): if args.api_key: google_translate_api_key = args.api_key translator = Translator(args.dst_language, google_translate_api_key, dst=args.dst_language, src=args.src_language) prompt = "Translating from {0} to {1}: ".format(args.src_language, args.dst_language) widgets = [prompt, Percentage(), ' ', Bar(), ' ', ETA()] pbar = ProgressBar(widgets=widgets, maxval=len(regions)).start() translated_transcripts = [] for i, transcript in enumerate(pool.imap(translator, transcripts)): translated_transcripts.app(transcript) pbar.update(i) pbar.finish() transcripts = translated_transcripts else: print "Error: Subtitle translation requires specified Google Translate API key. \ See --help for further information." return 1 except KeyboardInterrupt: pbar.finish() pool.terminate() pool.join() print "Cancelling transcription" return 1 timed_subtitles = [(r, t) for r, t in zip(regions, transcripts) if t] formatter = FORMATTERS.get(args.format) formatted_subtitles = formatter(timed_subtitles) dest = args.output if not dest: base, ext = os.path.splitext(args.source_path) dest = "{base}.{format}".format(base=base, format=args.format) with open(dest, 'wb') as f: f.write(formatted_subtitles.encode("utf-8")) print "Subtitles file created at {}".format(dest) os.remove(audio_filename) return 0if __name__ == '__main__': sys.exit(main())参考链接：

　　https://github.com/agermanidis/autosub/issues/31

　　https://github.com/qq2225936589/autosub/blob/master/autosub_app.py

标签：交互设计

上一篇：如何快速入门SQL？

下一篇：返回列表

英文视频字幕自动生成

相关推荐

如何快速入门SQL？

最新 - 美国大学毕业起薪最高的大学和专业排名

空军某部特级试飞员梁万俊——驯服“枭龙”的雄鹰