多年之前,就有为不带字幕的游戏视频配上字幕的想法。! W+ T% _2 n c9 Y1 U. [
但是当时条件不成熟,但是目前来看,条件似乎成熟了3 s# b# r5 N* o4 ~, c& |
2 l5 E4 R* t) i7 a! O4 X( h3 d9 @
Whisper是openAI的开源语音识别软件。9 t( _% z. ?" c% D% B
它有一个.net的版本,在这个版本的基础上进行少量修改,就能将游戏视频对应的字幕识别成srt格式。0 y+ w9 A' T2 M1 {8 q( K, ]/ m, |4 k
之后,对这个srt文件再进行在线批量翻译之后,进行少量调整之后,汉化工作就完成了。
0 ]1 H+ u. I# p4 P
" F+ X! A2 r; c1 C8 ]6 G2 { 地址如下
; P) b. Q) J* }. | https://github.com/sandrohanea/whisper.net2 H- K$ I. b( O* n% N+ ^0 r
% `8 t' D$ a) S" S7 O& F
$ A" R7 _, |1 F4 O 编译最好使用vs2022编译,否则在.net sdk版本上会出很多问题。
9 ~/ J) G9 Y1 g9 [0 f0 }8 E; ~, q+ B9 f
编译好之后,有几个注意点 K7 T6 z6 a3 L1 s
& m1 B6 f8 g/ n5 {( o- p3 T <0>使用的模型文件修改为大模型,ggml-large.bin,用这个模型效果比较好。
; B* d& j: b8 z q' I! a1 z3 e 当然,所有时间也会比较多,估计转换一批文件需要几个甚至几十个小时。 4 l% T; O2 Q _% \. N) F3 U/ x8 {
* i* z s: ?- u( w$ m
<1>Language要设定为"english"。
8 l2 ^8 v2 a, G
9 T. F, _4 ^% j! r! ?- /* var builder = factory.CreateBuilder()2 }% u, ^6 ~' G7 \( h" z5 v/ m+ _3 U
- .WithLanguage(opt.Language);*/
5 h0 \) w7 R% v4 ~ - var builder = factory.CreateBuilder()# q- m4 }) y; u6 |, R
- .WithLanguage("english");
复制代码
8 S/ ~1 p8 k6 a% S/ }! F/ ^. J <2>缺省好像只支持Wav格式,而且是要16K采样率的,需要实现转换成这种格式,否则会出错。
4 Z9 ?4 ~8 G" L9 I5 j, ~3 k8 t, h; Z* O6 B
<3>缺省只提供了一个例子wav文件的转换,需要改为批量形式。7 [, k' o; }0 t* l( z$ c7 {
(遍历某个目录中的所有文件)
% d3 z. r, ~3 J; s9 E; j) V6 O0 P S9 u: t. v0 E( V1 B
<4>输出的文件,需要稍加整理,以符合srt格式1 c+ Z; N% _; r9 C6 t. B
x7 J x/ A5 _4 A5 I
以下是一个Wav文件的控制台输出(幽魂开场动画)8 e: s# U! j1 n) m6 l
* n, D3 N* ]. R' h2 N g
- . s; M. N" W$ \3 p. ^6 ~* Q7 O
- whisper_init_from_file_no_state: loading model from 'ggml-large.bin'
' Y% b& d" p5 Y. ?, C" ]( v - whisper_model_load: loading model; N0 f$ S( w, u/ f) ?0 s. s1 \5 F
- whisper_model_load: n_vocab = 51865% R# M- t1 U5 M+ P6 ]0 V0 A( [
- whisper_model_load: n_audio_ctx = 1500' n x- g( O, { E( L0 _
- whisper_model_load: n_audio_state = 1280! V' _' N T. N- @4 A3 u. s
- whisper_model_load: n_audio_head = 20
, i1 A& k Q8 Q R# @3 h - whisper_model_load: n_audio_layer = 32
@$ h; d$ v7 @9 ?& R: k4 O* S - whisper_model_load: n_text_ctx = 4483 R: g$ z* B2 }
- whisper_model_load: n_text_state = 12803 i/ y, Z* c9 _: i7 ]# c
- whisper_model_load: n_text_head = 20! {% m4 y7 @9 j$ n% A
- whisper_model_load: n_text_layer = 329 r: ^( e+ x r$ k
- whisper_model_load: n_mels = 80, N5 Y& V1 q; q
- whisper_model_load: ftype = 1
* F, W6 {1 u2 L- o$ V% N, v" A. ~ - whisper_model_load: qntvr = 07 e8 ?' f2 b1 ^7 g9 h
- whisper_model_load: type = 5
+ Z0 G! W- Z ]" m/ Q: \ - whisper_model_load: mem required = 3557.00 MB (+ 71.00 MB per decoder)
: Z+ U6 o6 a+ f - whisper_model_load: adding 1608 extra tokens
+ p* A Z7 @+ q; y1 Z9 M! E - whisper_model_load: model ctx = 2951.27 MB
0 u/ a! x- W+ O9 @ - whisper_model_load: model size = 2950.66 MB/ i" ^+ Q Z! P( }2 ^; ?" J: g
- whisper_init_state: kv self size = 70.00 MB
. L2 T C2 ?# A. K% V - whisper_init_state: kv cross size = 234.38 MB
5 h8 a y F p/ t - New Segment: 00:00:00 ==> 00:00:02.7600000 : (birds chirping)
: o( _% n6 Y8 j" M) p4 x - New Segment: 00:00:03.6600000 ==> 00:00:05.9000000 : (exhaling)
9 N+ D1 N) i, y& @* o6 C - New Segment: 00:00:05.9000000 ==> 00:00:08.6600000 : (birds chirping)$ H5 z! I& j- X7 Y4 I- p* K
- New Segment: 00:00:08.6600000 ==> 00:00:35.1200000 : (gun firing)
$ s/ e: h$ E' {( [ - New Segment: 00:00:36.1200000 ==> 00:00:38.5400000 : (gun firing)
1 N1 \( ]" x- }$ z - New Segment: 00:00:39.0600000 ==> 00:00:41.4800000 : (gun firing)
4 A5 E# P' \! o0 r J9 y$ q - New Segment: 00:00:41.4800000 ==> 00:00:49.4000000 : (tires screeching)3 r% Z! z' @8 t( n3 g! i
- New Segment: 00:00:49.4000000 ==> 00:00:58.5800000 : (glass shattering)
6 d6 ]. W, m, ? s9 n - New Segment: 00:00:58.5800000 ==> 00:01:07.7400000 : (singing in foreign language) _8 d" }1 o7 V6 P& n% C8 x' C
- New Segment: 00:01:07.7400000 ==> 00:01:11.5800000 : (singing in foreign language)* S8 Z' `; ? j: i! U+ s
- New Segment: 00:01:11.5800000 ==> 00:01:17 : (tires screeching)
' f" h: ^# b$ o4 M2 c - New Segment: 00:01:17 ==> 00:01:24.8400000 : (singing in foreign language)5 G8 X, o8 Q2 b) r0 I
- New Segment: 00:01:24.8400000 ==> 00:01:28.6400000 : (panting); Q$ u4 g, E0 N3 [1 n7 o
- New Segment: 00:01:36.7800000 ==> 00:01:39.2000000 : (gun firing)* K' d! ^8 k. L. c9 w$ ^/ U
- New Segment: 00:01:39.2000000 ==> 00:01:43.4600000 : - Adrian.2 ]; o" M' [* L7 E
- New Segment: 00:01:43.4600000 ==> 00:01:45.6200000 : - Oh God.0 r. ~4 O% `1 x6 k! Q) g
- New Segment: 00:01:45.6200000 ==> 00:01:48.2000000 : - What's the matter sweetheart?. [' c1 p) x" p% h
- New Segment: 00:01:48.2000000 ==> 00:01:50.4200000 : Oh.1 f# I! j. Q) k) p- b$ ]
- New Segment: 00:01:50.4200000 ==> 00:01:53.4600000 : - Oh it's horrible.6 ^) y. `9 ~+ F8 [! R5 e
- New Segment: 00:01:53.4600000 ==> 00:01:55.3000000 : - Shh.
/ B6 h( h( ]9 I9 O$ v - New Segment: 00:01:55.3000000 ==> 00:02:02.3400000 : It was just a bad dream.
" V/ n) A2 t S! |5 D - New Segment: 00:02:05.4200000 ==> 00:02:09.8800000 : - You don't ever have to be afraid of anything.
; `! d. S+ o1 O) J - New Segment: 00:02:09.8800000 ==> 00:02:12.8000000 : I'll always be here to protect you.
' h, d" m, g0 \ - New Segment: 00:02:12.9200000 ==> 00:02:15.5000000 : (gentle music)
# {3 J( q' A: a% ` - New Segment: 00:02:16.4800000 ==> 00:02:19.0600000 : (gentle music)5 Y" e1 ^% M8 `! J
- New Segment: 00:02:19.0600000 ==> 00:02:21.6400000 : (gentle music)
' C Q% y- z6 [2 X - New Segment: 00:02:21.6400000 ==> 00:02:24.2200000 : (gentle music)# Z; g; T: D: c
- New Segment: 00:02:24.5400000 ==> 00:02:27.1200000 : (gentle music)
V3 u- I9 j# U7 k: G' Y8 y8 c/ }, J9 @ - New Segment: 00:02:27.1200000 ==> 00:02:29.7000000 : (gentle music)2 j9 ]& [: ^5 [' O" ]6 m
- New Segment: 00:02:29.7000000 ==> 00:02:33.1800000 : [Music]) R2 E6 \1 b1 Y/ C
-
复制代码 3 a3 Y. ]6 T% z; b
+ b# h: v8 y4 `' v2 ?
|