多年之前,就有为不带字幕的游戏视频配上字幕的想法。
* c+ {2 l6 l8 w: s+ h1 Y 但是当时条件不成熟,但是目前来看,条件似乎成熟了
" a3 ]& \0 O/ J% e2 j( \
. [, a. ]& V6 V Whisper是openAI的开源语音识别软件。; l9 `/ n) `3 g- ]3 F* {6 Y
它有一个.net的版本,在这个版本的基础上进行少量修改,就能将游戏视频对应的字幕识别成srt格式。0 n |' J9 X/ F2 R
之后,对这个srt文件再进行在线批量翻译之后,进行少量调整之后,汉化工作就完成了。
+ V4 T8 W' ^! q; W, ~; @8 d c- J1 ~+ h
地址如下3 m! s8 N! f- ]( Q, d
https://github.com/sandrohanea/whisper.net
3 u1 B# D6 O3 Y9 G* Q- B
0 g; g8 C- H1 K K7 I( l ~& M, Y$ n! }' k5 o0 {5 W
编译最好使用vs2022编译,否则在.net sdk版本上会出很多问题。
6 N# Y! K8 {6 r3 c, e
: i ]& C' f* Y- ` 编译好之后,有几个注意点. C. d0 j/ F+ c: q6 W9 j7 b: J
: p3 G7 v/ B! j) [9 [ <0>使用的模型文件修改为大模型,ggml-large.bin,用这个模型效果比较好。
8 D8 u7 u8 x. n0 J 当然,所有时间也会比较多,估计转换一批文件需要几个甚至几十个小时。 & A8 ?# ?% A, z2 {; e
. w! Q* c) r; \! d" J+ ?, g H <1>Language要设定为"english"。 v6 H- \& V$ O0 |: t8 p! E
% p( c4 a* P' m% f7 A- /* var builder = factory.CreateBuilder()
: z/ M" P7 S O8 h6 O8 e - .WithLanguage(opt.Language);*/2 m9 `* t4 \7 H- _ |
- var builder = factory.CreateBuilder()0 c0 I! H* {& W9 I
- .WithLanguage("english");
复制代码 4 x f+ ~2 f5 G5 s2 Z# K5 b! R
<2>缺省好像只支持Wav格式,而且是要16K采样率的,需要实现转换成这种格式,否则会出错。
% u9 y! Z0 K6 Z( G# [. ^. u/ n0 [% Y$ ?: l5 c2 a* m' i
<3>缺省只提供了一个例子wav文件的转换,需要改为批量形式。
; a" J# S. o. f' d2 D5 H (遍历某个目录中的所有文件)5 J) W3 U# c; ~) ]& M
$ E9 ^. @: W' {5 y* ^0 b <4>输出的文件,需要稍加整理,以符合srt格式/ N$ g' |$ X6 v1 E
; O. e# S! |, a0 E- P7 Q, E: V9 y 以下是一个Wav文件的控制台输出(幽魂开场动画) ^: e" G) A: |5 q: B& W/ N4 i
- H0 i3 E% O( @0 i% @" J+ Q" o3 `
# h$ |' K8 y4 \! P3 k- whisper_init_from_file_no_state: loading model from 'ggml-large.bin'5 i" \; N# c5 ^/ i
- whisper_model_load: loading model
" q" O( @' X5 r c4 s, e - whisper_model_load: n_vocab = 518652 c' v. F5 S8 b$ L) V# U8 x$ V; ]
- whisper_model_load: n_audio_ctx = 15003 r9 h0 u0 F. R' [ T$ u6 i
- whisper_model_load: n_audio_state = 1280
; j) P) D6 |/ | [1 ~ - whisper_model_load: n_audio_head = 20
2 w% Q+ |" ?6 W- a) v - whisper_model_load: n_audio_layer = 32( T. n1 B9 l1 u+ `4 Y2 X
- whisper_model_load: n_text_ctx = 448" n v' ^( y) p. _
- whisper_model_load: n_text_state = 1280
1 J9 e7 y# h1 j' R. e0 }( @ - whisper_model_load: n_text_head = 20
8 L6 u3 H" ?3 W! I" |' b* f - whisper_model_load: n_text_layer = 326 s- i7 q3 V3 y9 L& Y3 V' J
- whisper_model_load: n_mels = 80
: i8 d, }7 W8 @3 z$ a2 R - whisper_model_load: ftype = 1
. v" F" i# a2 D W1 t2 R o - whisper_model_load: qntvr = 0
- }5 k l2 \4 F" X" l! r H" q- k - whisper_model_load: type = 5
# F+ c8 ^( ?4 a: n5 u- M0 b) w - whisper_model_load: mem required = 3557.00 MB (+ 71.00 MB per decoder)8 ^1 o# h1 X) M# ~5 a1 z
- whisper_model_load: adding 1608 extra tokens
" ]5 c ]7 R/ `, n/ N - whisper_model_load: model ctx = 2951.27 MB: G) O$ Y$ b+ K# A3 g3 J% W* C
- whisper_model_load: model size = 2950.66 MB5 L. V1 O* B$ S0 h/ c
- whisper_init_state: kv self size = 70.00 MB
5 ~8 e1 u/ L0 `' f - whisper_init_state: kv cross size = 234.38 MB
: c+ t! m! J- p+ w& @1 }0 A' n - New Segment: 00:00:00 ==> 00:00:02.7600000 : (birds chirping)
S6 G- o$ L. ?9 J5 N - New Segment: 00:00:03.6600000 ==> 00:00:05.9000000 : (exhaling) h6 f5 u3 ~) Q: s, h5 t U) o3 J
- New Segment: 00:00:05.9000000 ==> 00:00:08.6600000 : (birds chirping)) R* r7 Y, X3 G( G& ?! `. P! f
- New Segment: 00:00:08.6600000 ==> 00:00:35.1200000 : (gun firing)
- g& ]; z* d/ O% B4 H6 C+ Y9 `7 W - New Segment: 00:00:36.1200000 ==> 00:00:38.5400000 : (gun firing) A% E. m: W( b
- New Segment: 00:00:39.0600000 ==> 00:00:41.4800000 : (gun firing). s+ g1 x: U* j' t: v
- New Segment: 00:00:41.4800000 ==> 00:00:49.4000000 : (tires screeching)
+ P3 w: a1 a% I0 P, G7 E - New Segment: 00:00:49.4000000 ==> 00:00:58.5800000 : (glass shattering)
( M( P* Y; D' ^; ?( A - New Segment: 00:00:58.5800000 ==> 00:01:07.7400000 : (singing in foreign language)
[6 M" @3 ]# ^. v Q$ j - New Segment: 00:01:07.7400000 ==> 00:01:11.5800000 : (singing in foreign language)
; \$ {! M! W0 O* ]$ Z - New Segment: 00:01:11.5800000 ==> 00:01:17 : (tires screeching)
8 O: D" L8 h; m- b( v& }+ \ - New Segment: 00:01:17 ==> 00:01:24.8400000 : (singing in foreign language)
, j' K1 ^. C! c' i" Z! d- \ - New Segment: 00:01:24.8400000 ==> 00:01:28.6400000 : (panting)& w8 Q- X, P8 V. ~" M
- New Segment: 00:01:36.7800000 ==> 00:01:39.2000000 : (gun firing)
, G7 j1 q _, M1 w% `. K - New Segment: 00:01:39.2000000 ==> 00:01:43.4600000 : - Adrian.
- Y8 {. ]- A% p0 F! C0 Z- e9 ^ - New Segment: 00:01:43.4600000 ==> 00:01:45.6200000 : - Oh God.
* C! G+ ~( J3 ?9 b# x0 } - New Segment: 00:01:45.6200000 ==> 00:01:48.2000000 : - What's the matter sweetheart?$ z4 @8 T- e) p/ ~
- New Segment: 00:01:48.2000000 ==> 00:01:50.4200000 : Oh.5 _1 P' F: k# y- [, O
- New Segment: 00:01:50.4200000 ==> 00:01:53.4600000 : - Oh it's horrible.! `$ v; J5 k$ n1 }9 f+ K
- New Segment: 00:01:53.4600000 ==> 00:01:55.3000000 : - Shh.. ^2 Z) v! A \6 ~, J
- New Segment: 00:01:55.3000000 ==> 00:02:02.3400000 : It was just a bad dream.9 B6 R0 X7 J; D' @
- New Segment: 00:02:05.4200000 ==> 00:02:09.8800000 : - You don't ever have to be afraid of anything.. V6 Z J5 B1 r
- New Segment: 00:02:09.8800000 ==> 00:02:12.8000000 : I'll always be here to protect you./ H3 \8 ]4 N# ?& \: w1 O
- New Segment: 00:02:12.9200000 ==> 00:02:15.5000000 : (gentle music)
! a7 B8 ]/ z5 i# ~# F4 [ - New Segment: 00:02:16.4800000 ==> 00:02:19.0600000 : (gentle music)$ U0 ~2 x8 w3 l( L# d, c
- New Segment: 00:02:19.0600000 ==> 00:02:21.6400000 : (gentle music)1 {0 G8 r; O! u4 n- }: u
- New Segment: 00:02:21.6400000 ==> 00:02:24.2200000 : (gentle music)( Z6 l5 {, V, Y' I& u m/ ?
- New Segment: 00:02:24.5400000 ==> 00:02:27.1200000 : (gentle music)( G; z2 `: c* o) p. m! C: ^# ]( e
- New Segment: 00:02:27.1200000 ==> 00:02:29.7000000 : (gentle music)
. _' {2 z& B; a3 D8 a) C7 k - New Segment: 00:02:29.7000000 ==> 00:02:33.1800000 : [Music]
/ M, f, M$ o8 X2 I -
复制代码 / d. `! H3 q) m% A( }
, m& ~( f, O0 A% N2 G/ E3 ^7 ~
|