多年之前,就有为不带字幕的游戏视频配上字幕的想法。6 c& X7 ~4 p7 X
但是当时条件不成熟,但是目前来看,条件似乎成熟了& r6 C' }" y, F: ~5 c6 Q9 l: `
' J" `. {' _: F: \& ^3 O Whisper是openAI的开源语音识别软件。
( Z! {9 y Q/ } 它有一个.net的版本,在这个版本的基础上进行少量修改,就能将游戏视频对应的字幕识别成srt格式。5 ]( A) i# a) T7 S6 V
之后,对这个srt文件再进行在线批量翻译之后,进行少量调整之后,汉化工作就完成了。
* a) S6 w# \ d; _4 b! Y/ E6 Z* l
% j# V# Y& x# Q; G/ R [ 地址如下. V2 u0 C7 Y+ b7 \( o+ C+ w4 X+ S
https://github.com/sandrohanea/whisper.net
( C* P2 J: l! F1 w# Y
# f# }9 n: a) a k( H# a
. F$ k0 e: z6 L 编译最好使用vs2022编译,否则在.net sdk版本上会出很多问题。
' s% ]$ s6 H% l, n
3 F6 F% t6 O& d6 `7 r8 F# @, v0 D$ n 编译好之后,有几个注意点
: k2 B! c1 W& T
' @: P& B+ v" |( q6 q' m. ^ <0>使用的模型文件修改为大模型,ggml-large.bin,用这个模型效果比较好。
m1 E! c+ c% K) N* n 当然,所有时间也会比较多,估计转换一批文件需要几个甚至几十个小时。 7 R3 n& h8 Y i, Q) P
# j" w1 [6 U( o9 J/ O4 ^: v
<1>Language要设定为"english"。
. R2 B1 ?6 l: c: I0 n0 X8 \( K) |& G- D: `/ r7 j( U6 I9 T
- /* var builder = factory.CreateBuilder()0 v, p" b" R0 r
- .WithLanguage(opt.Language);*/0 v: \2 M" `& l" H, e
- var builder = factory.CreateBuilder()# u" N7 @9 S2 D* D8 o( d
- .WithLanguage("english");
复制代码
/ X8 ^. F. ]1 g <2>缺省好像只支持Wav格式,而且是要16K采样率的,需要实现转换成这种格式,否则会出错。
. v* v. @: E }3 w+ g- D0 Q: }; `& ^5 ]4 Z9 n0 s) h; U4 b
<3>缺省只提供了一个例子wav文件的转换,需要改为批量形式。
, S4 A' t$ u& G' {% X. @ (遍历某个目录中的所有文件)1 D8 {7 m) C- G8 {6 B
* [; U0 A) `4 Q% d <4>输出的文件,需要稍加整理,以符合srt格式4 X- r. e& W$ \- b
/ I, ~1 p+ _6 h
以下是一个Wav文件的控制台输出(幽魂开场动画)
& {' \% c% }( V" }( C4 j6 c8 z1 J" s- h; c! v1 M" @" q- N! N
" F1 N7 h6 u$ {: U, p, `- whisper_init_from_file_no_state: loading model from 'ggml-large.bin'
7 k" G9 U8 x: G4 x - whisper_model_load: loading model$ @4 r. ~$ N# z6 d. g: H8 s
- whisper_model_load: n_vocab = 51865% @2 B" K8 [* q# X( L
- whisper_model_load: n_audio_ctx = 1500
: t5 X& h. }/ [. W* n - whisper_model_load: n_audio_state = 1280
8 b4 D5 k8 q2 |# e$ B: d( V - whisper_model_load: n_audio_head = 20" v. p1 Y& @- e" ]: Q$ z6 @+ H
- whisper_model_load: n_audio_layer = 32+ I+ V8 h& N" X4 c/ E9 a3 j! g4 e
- whisper_model_load: n_text_ctx = 448
T) P- j" H' J# r1 w( N - whisper_model_load: n_text_state = 1280, G: b( K6 t* z" a% _
- whisper_model_load: n_text_head = 20+ I- \/ E5 j d& ?2 k9 k" d$ a5 y
- whisper_model_load: n_text_layer = 32, K. c# T! [$ m ^3 A ?
- whisper_model_load: n_mels = 80! r+ `- I7 L# }* x) w% t
- whisper_model_load: ftype = 1( x' G6 `7 @9 [1 w% `. J
- whisper_model_load: qntvr = 0
% v5 a; a& D w$ _/ c& C2 m - whisper_model_load: type = 5, [6 B, s( G( q8 ]
- whisper_model_load: mem required = 3557.00 MB (+ 71.00 MB per decoder): W* \( F2 c d: y. Z; m0 E
- whisper_model_load: adding 1608 extra tokens6 {: E7 o0 a* R- s9 k4 n
- whisper_model_load: model ctx = 2951.27 MB
. X5 A/ X6 p0 T. T8 N3 k4 g - whisper_model_load: model size = 2950.66 MB
, q- i5 q/ f/ I - whisper_init_state: kv self size = 70.00 MB/ @! w7 g4 Y9 }7 _' _4 D) f2 T: r
- whisper_init_state: kv cross size = 234.38 MB
9 J. P8 M& v1 V/ m' G - New Segment: 00:00:00 ==> 00:00:02.7600000 : (birds chirping)
; p2 P& t0 c5 k0 x - New Segment: 00:00:03.6600000 ==> 00:00:05.9000000 : (exhaling)7 ?" T' E) A* c
- New Segment: 00:00:05.9000000 ==> 00:00:08.6600000 : (birds chirping)8 ]3 E6 H( @3 i6 Y/ _ U; E0 E, T
- New Segment: 00:00:08.6600000 ==> 00:00:35.1200000 : (gun firing)
2 R i" L5 F8 p" F9 j" N) F9 X - New Segment: 00:00:36.1200000 ==> 00:00:38.5400000 : (gun firing)
1 y# \2 h! V; d7 a4 p8 {, Z - New Segment: 00:00:39.0600000 ==> 00:00:41.4800000 : (gun firing)
+ S8 e. J6 l+ n3 x) l+ g - New Segment: 00:00:41.4800000 ==> 00:00:49.4000000 : (tires screeching)
' d7 q9 h. a' `! a9 s2 } - New Segment: 00:00:49.4000000 ==> 00:00:58.5800000 : (glass shattering)7 s/ R+ j$ [/ v8 k( o( y
- New Segment: 00:00:58.5800000 ==> 00:01:07.7400000 : (singing in foreign language)
: w* I' i2 M3 n, Y - New Segment: 00:01:07.7400000 ==> 00:01:11.5800000 : (singing in foreign language)+ D0 v) o" c! M# z
- New Segment: 00:01:11.5800000 ==> 00:01:17 : (tires screeching)+ d, r9 c2 [+ L6 f; x
- New Segment: 00:01:17 ==> 00:01:24.8400000 : (singing in foreign language)# @; C6 M; V9 O- \/ B6 o
- New Segment: 00:01:24.8400000 ==> 00:01:28.6400000 : (panting)
! X% R0 E Q$ F9 e% s- P - New Segment: 00:01:36.7800000 ==> 00:01:39.2000000 : (gun firing)
. `; S1 J K8 Y$ V3 U% V - New Segment: 00:01:39.2000000 ==> 00:01:43.4600000 : - Adrian.
6 M: B# N% V3 p5 ~6 M+ s - New Segment: 00:01:43.4600000 ==> 00:01:45.6200000 : - Oh God.
5 t4 D* \/ I# ^ - New Segment: 00:01:45.6200000 ==> 00:01:48.2000000 : - What's the matter sweetheart?" o( W* J1 O# @$ _- u
- New Segment: 00:01:48.2000000 ==> 00:01:50.4200000 : Oh.: ~8 J7 Q: ?) `* G; Z! ]4 |
- New Segment: 00:01:50.4200000 ==> 00:01:53.4600000 : - Oh it's horrible.) C7 @1 u4 C' I8 L$ j
- New Segment: 00:01:53.4600000 ==> 00:01:55.3000000 : - Shh.& C- o/ g- ~: U4 x
- New Segment: 00:01:55.3000000 ==> 00:02:02.3400000 : It was just a bad dream.: [6 F' \: \, u/ N" M: ^
- New Segment: 00:02:05.4200000 ==> 00:02:09.8800000 : - You don't ever have to be afraid of anything.3 S7 @3 U r2 I3 F
- New Segment: 00:02:09.8800000 ==> 00:02:12.8000000 : I'll always be here to protect you.* w+ o. K, D( l- I P* ~: A: g/ R1 _
- New Segment: 00:02:12.9200000 ==> 00:02:15.5000000 : (gentle music)
5 k8 W, s# R. r, P, I - New Segment: 00:02:16.4800000 ==> 00:02:19.0600000 : (gentle music)
( N# |7 G3 z3 i& t: \ - New Segment: 00:02:19.0600000 ==> 00:02:21.6400000 : (gentle music)6 q$ A2 [/ S( `. E' |& }8 H
- New Segment: 00:02:21.6400000 ==> 00:02:24.2200000 : (gentle music)
' f0 N: Y: x; T - New Segment: 00:02:24.5400000 ==> 00:02:27.1200000 : (gentle music)( @5 f9 K- K- ` N1 [( D
- New Segment: 00:02:27.1200000 ==> 00:02:29.7000000 : (gentle music)+ r+ k3 q5 m3 E3 k
- New Segment: 00:02:29.7000000 ==> 00:02:33.1800000 : [Music]% ]" p4 k2 y; b+ l2 k1 y% @
-
复制代码 & ?' v. k! ?# R$ h7 p# e
: i1 I& g# _7 H
|