多年之前,就有为不带字幕的游戏视频配上字幕的想法。
( m+ S/ ]$ q& ^3 J4 I" [: _ 但是当时条件不成熟,但是目前来看,条件似乎成熟了) ?" L6 T/ P$ v8 j7 T
6 w4 V) H8 D8 B- _- p
Whisper是openAI的开源语音识别软件。* g5 ]/ t' F$ j- z T
它有一个.net的版本,在这个版本的基础上进行少量修改,就能将游戏视频对应的字幕识别成srt格式。
& S+ ]+ W2 N* [4 y! z 之后,对这个srt文件再进行在线批量翻译之后,进行少量调整之后,汉化工作就完成了。& S, W' _5 l* c
m! F7 R! I( p" P 地址如下
7 g( o' H' l% C https://github.com/sandrohanea/whisper.net& N6 {) J1 P9 X8 |
& _+ W# P1 y- m
8 g; P% W8 Z' d. [7 t4 Z
编译最好使用vs2022编译,否则在.net sdk版本上会出很多问题。6 M: f( }9 E) O ]
% W8 _5 [) w, f; K C2 r
编译好之后,有几个注意点
5 ~7 t3 U' n6 F* w0 z
! N. B# z4 b' _: y, b1 D9 s! s" ` <0>使用的模型文件修改为大模型,ggml-large.bin,用这个模型效果比较好。
5 X0 h. O2 @0 H- z4 E, v4 L/ p' ~ 当然,所有时间也会比较多,估计转换一批文件需要几个甚至几十个小时。
! D' @+ W& A( X8 W. ^; j- W" g, D# B
0 F6 F \' @5 v, ^ <1>Language要设定为"english"。
% P9 {6 T# D2 j6 w( `* h+ @* N4 Z. U! a
- /* var builder = factory.CreateBuilder()5 N2 i! O. o' o* V4 _: C9 R- F6 N
- .WithLanguage(opt.Language);*/2 c. |" {" h/ s e3 T W6 i, j
- var builder = factory.CreateBuilder(): ^% ^5 t* a# A3 }
- .WithLanguage("english");
复制代码 ; W; y2 [2 A- Q# h [ b+ _" O
<2>缺省好像只支持Wav格式,而且是要16K采样率的,需要实现转换成这种格式,否则会出错。' a$ X4 t/ w8 Q! K+ r+ F- i
5 \! T: x! d v ~0 G9 _- f) D <3>缺省只提供了一个例子wav文件的转换,需要改为批量形式。. Q% C- {! N. {: `6 [; n
(遍历某个目录中的所有文件)
, L3 r4 W2 X" _
. \: \! H) ~5 }0 l/ @4 D' A <4>输出的文件,需要稍加整理,以符合srt格式
7 i/ i. N* }* g6 Q& s
3 g! J& v3 @! g% m- d4 n/ ?4 @- |: t 以下是一个Wav文件的控制台输出(幽魂开场动画)) T" e( r1 B7 v3 p) v$ Q
( } U A1 P( z5 g. p8 ]1 G) Q0 u
- 3 i0 c5 ]% ~; q. t) c
- whisper_init_from_file_no_state: loading model from 'ggml-large.bin'4 }4 d! ^& ?' }- \, ]
- whisper_model_load: loading model
" z" r- X' L! ^/ r- ?0 ]6 z - whisper_model_load: n_vocab = 51865' P( [; p; Z9 J
- whisper_model_load: n_audio_ctx = 1500$ _ v. P3 l0 j# s+ `
- whisper_model_load: n_audio_state = 1280
3 _# x% l% I' O% t! L; x - whisper_model_load: n_audio_head = 20
; V0 M& \% O9 ]" L7 O - whisper_model_load: n_audio_layer = 32/ W2 }# G! C# O4 U6 s
- whisper_model_load: n_text_ctx = 448
1 I& _' C1 H% M2 q7 B1 R - whisper_model_load: n_text_state = 1280/ T5 U% Q, H* C# g: f
- whisper_model_load: n_text_head = 20
4 I: ~' m5 N$ A - whisper_model_load: n_text_layer = 32( E) G* Z( Z' G4 `. Z; U4 l
- whisper_model_load: n_mels = 80
9 }- m% @( T$ l5 _ - whisper_model_load: ftype = 1
" D Y5 u9 M8 t p& n, d - whisper_model_load: qntvr = 0* k0 ]+ J) N e$ @# f, \4 M" ^
- whisper_model_load: type = 5; C, `1 h" }: Z+ K
- whisper_model_load: mem required = 3557.00 MB (+ 71.00 MB per decoder)( v! l% \' i5 }8 w
- whisper_model_load: adding 1608 extra tokens1 w! ^0 ^8 t! X C) W. n' }( x
- whisper_model_load: model ctx = 2951.27 MB
+ n& ?% F6 S2 O+ L$ | - whisper_model_load: model size = 2950.66 MB
5 y. ~- d3 u. C) N ]# c; t* m - whisper_init_state: kv self size = 70.00 MB
) }5 S# R8 g. W, k% [7 y - whisper_init_state: kv cross size = 234.38 MB
0 P4 n4 T- V& Z( l% l9 { M" B - New Segment: 00:00:00 ==> 00:00:02.7600000 : (birds chirping)7 @7 o4 ~! T& W! k1 W& b! O- m: P
- New Segment: 00:00:03.6600000 ==> 00:00:05.9000000 : (exhaling)* L* S# x9 R5 l! o
- New Segment: 00:00:05.9000000 ==> 00:00:08.6600000 : (birds chirping)
: P2 H$ {0 c4 z" _8 H - New Segment: 00:00:08.6600000 ==> 00:00:35.1200000 : (gun firing)
- B# M1 J% Z; ]- `3 ~$ u+ N - New Segment: 00:00:36.1200000 ==> 00:00:38.5400000 : (gun firing)8 v" H% w' p' k
- New Segment: 00:00:39.0600000 ==> 00:00:41.4800000 : (gun firing)
; i! w, C/ k4 ?( q, }0 ?$ w - New Segment: 00:00:41.4800000 ==> 00:00:49.4000000 : (tires screeching)
: o4 V: \7 {5 p) U6 i% B/ ^$ t - New Segment: 00:00:49.4000000 ==> 00:00:58.5800000 : (glass shattering)
J( @; Q/ M) t2 ^6 U - New Segment: 00:00:58.5800000 ==> 00:01:07.7400000 : (singing in foreign language)
; Z. V) s+ x/ Y# [9 r- T+ I - New Segment: 00:01:07.7400000 ==> 00:01:11.5800000 : (singing in foreign language)8 n* F$ Z9 Z# S& |) O* x
- New Segment: 00:01:11.5800000 ==> 00:01:17 : (tires screeching)# S+ [* h" A7 \6 h
- New Segment: 00:01:17 ==> 00:01:24.8400000 : (singing in foreign language)
h; r1 n: v. c% I - New Segment: 00:01:24.8400000 ==> 00:01:28.6400000 : (panting): o3 s5 M& R' { J# t X$ ~
- New Segment: 00:01:36.7800000 ==> 00:01:39.2000000 : (gun firing)% c& Y0 S( s r$ E4 @
- New Segment: 00:01:39.2000000 ==> 00:01:43.4600000 : - Adrian.
0 y0 e! H Q2 [' H h - New Segment: 00:01:43.4600000 ==> 00:01:45.6200000 : - Oh God.
6 O9 x8 P6 _. p+ N! P2 ` - New Segment: 00:01:45.6200000 ==> 00:01:48.2000000 : - What's the matter sweetheart?
& U" a8 ~; { I) N6 |2 B( h+ ? - New Segment: 00:01:48.2000000 ==> 00:01:50.4200000 : Oh.
4 J7 H1 B6 L- @8 }: v- W - New Segment: 00:01:50.4200000 ==> 00:01:53.4600000 : - Oh it's horrible.0 o& Q, d2 h# z8 O
- New Segment: 00:01:53.4600000 ==> 00:01:55.3000000 : - Shh.. n2 \) i- |: R1 D# I
- New Segment: 00:01:55.3000000 ==> 00:02:02.3400000 : It was just a bad dream.
) d9 [2 o; H) O0 R1 p6 X* R, ^ - New Segment: 00:02:05.4200000 ==> 00:02:09.8800000 : - You don't ever have to be afraid of anything.( d0 N8 i/ E/ O( z
- New Segment: 00:02:09.8800000 ==> 00:02:12.8000000 : I'll always be here to protect you.
+ ~: a+ \. e0 j; V - New Segment: 00:02:12.9200000 ==> 00:02:15.5000000 : (gentle music)) j8 ~3 X: {* C
- New Segment: 00:02:16.4800000 ==> 00:02:19.0600000 : (gentle music)
3 G# a4 L! U9 Q {! ` - New Segment: 00:02:19.0600000 ==> 00:02:21.6400000 : (gentle music)3 W' S6 ?# y9 T, f( H: O) Z$ s
- New Segment: 00:02:21.6400000 ==> 00:02:24.2200000 : (gentle music)! U! J2 c' Z, v, S% r
- New Segment: 00:02:24.5400000 ==> 00:02:27.1200000 : (gentle music)
' U% N4 s0 k } - New Segment: 00:02:27.1200000 ==> 00:02:29.7000000 : (gentle music)
3 `) m2 P5 s+ c+ }0 ~ k. L* ~, v - New Segment: 00:02:29.7000000 ==> 00:02:33.1800000 : [Music]
6 v8 ` Z6 y% O -
复制代码 3 a: T4 S" c- R R1 }, `7 C
( H5 q- [* o& K9 S m |