多年之前,就有为不带字幕的游戏视频配上字幕的想法。
' D( d1 p+ s; c% T. A4 U 但是当时条件不成熟,但是目前来看,条件似乎成熟了
" J" M" V5 @8 j9 b, `: F4 w& h% A1 a
Whisper是openAI的开源语音识别软件。% S" Y* c! I* ^& ?6 l/ n, [7 \/ ]
它有一个.net的版本,在这个版本的基础上进行少量修改,就能将游戏视频对应的字幕识别成srt格式。
; n5 f5 D; _3 w3 d4 G 之后,对这个srt文件再进行在线批量翻译之后,进行少量调整之后,汉化工作就完成了。
" F9 b0 W8 ]( I- I( ]/ u
; C8 ?8 E/ z9 v% \ 地址如下
v V( t0 }5 ]/ g7 x( H https://github.com/sandrohanea/whisper.net
4 k5 v8 N* U7 L9 y/ J
6 t+ [/ G5 u3 L8 E; W. Y
! j( U/ J3 e9 s2 L Z5 v 编译最好使用vs2022编译,否则在.net sdk版本上会出很多问题。
" ^' O! z% v4 U5 u3 K$ f U8 p. s4 p! L8 u% |5 H
编译好之后,有几个注意点
% X/ \) p& Z3 d$ o" I7 |6 e c3 N" c5 n3 g" s3 g; @
<0>使用的模型文件修改为大模型,ggml-large.bin,用这个模型效果比较好。/ O! A& [/ D6 c" K, o
当然,所有时间也会比较多,估计转换一批文件需要几个甚至几十个小时。
E6 L; u/ J: F( p) D
. ~9 A, u: c0 l; s# F <1>Language要设定为"english"。9 ?$ Q! b8 e- c# a
0 j, v/ S; Q. |
- /* var builder = factory.CreateBuilder()
% n8 o1 M( E' m. z" _0 e, n - .WithLanguage(opt.Language);*/0 w- j; L) W' N p
- var builder = factory.CreateBuilder()
4 R8 z- _& [+ V1 b1 { D6 Z - .WithLanguage("english");
复制代码
" E+ G. N4 d- C# `! u, b) h% q4 A/ a <2>缺省好像只支持Wav格式,而且是要16K采样率的,需要实现转换成这种格式,否则会出错。
! E5 T) I8 \- x; ^2 l6 ~. Z" A, N$ f( r) ]. y
<3>缺省只提供了一个例子wav文件的转换,需要改为批量形式。6 n9 A1 P7 a7 {( k3 H: X6 m# K+ b
(遍历某个目录中的所有文件)
5 P1 h3 x0 w" y+ x
0 I( J$ i1 k5 \/ w* }+ `1 T <4>输出的文件,需要稍加整理,以符合srt格式7 `5 B6 I" y0 k$ r
, L" k4 |5 B& N( H9 b8 K 以下是一个Wav文件的控制台输出(幽魂开场动画)3 w8 E7 }6 Y% n9 V1 U- C, `4 I
" n* x' l) a8 x/ {; z, L6 b
1 n- Q8 Y) X) _0 r- whisper_init_from_file_no_state: loading model from 'ggml-large.bin'
) _+ u+ Z) A Y5 F' O( y - whisper_model_load: loading model" B, p/ E" y3 x, h$ @. w
- whisper_model_load: n_vocab = 51865
& D1 }3 u3 H+ ]& U, R+ b - whisper_model_load: n_audio_ctx = 1500) g$ H; `1 D$ [1 _5 P8 v# P* K
- whisper_model_load: n_audio_state = 1280
9 y8 |6 V6 |# o3 o M- a( ` - whisper_model_load: n_audio_head = 20) ]$ `( ^' y: V" s; j# d
- whisper_model_load: n_audio_layer = 32
- t$ y5 b. p* _1 i. q: z# F - whisper_model_load: n_text_ctx = 448$ T/ D: z1 e6 G1 ~) r* O0 G
- whisper_model_load: n_text_state = 1280
6 ~5 p: U5 }8 {1 C1 k - whisper_model_load: n_text_head = 20
/ B; {& t' g" X" v+ o5 [, ] - whisper_model_load: n_text_layer = 32
3 q" M% x# O. N) c$ c* y4 O - whisper_model_load: n_mels = 80
. Q+ U# r6 O `6 I8 c3 A- b/ X- | - whisper_model_load: ftype = 1% t% u9 S4 Q$ e }7 v2 R1 {; K
- whisper_model_load: qntvr = 02 L+ _/ Z! U# `" C
- whisper_model_load: type = 5
5 I7 @' h. |4 W. j$ @ - whisper_model_load: mem required = 3557.00 MB (+ 71.00 MB per decoder)
" {# X( B: a; k* J - whisper_model_load: adding 1608 extra tokens
+ z- [# r/ }# {( t, c% c - whisper_model_load: model ctx = 2951.27 MB
+ c: L# O, P! W4 v0 u# g - whisper_model_load: model size = 2950.66 MB b6 L7 M+ v! r3 [7 j6 y! o
- whisper_init_state: kv self size = 70.00 MB; [5 T( Y6 [2 U3 T
- whisper_init_state: kv cross size = 234.38 MB
3 _) V8 V2 \1 X' z, M$ I - New Segment: 00:00:00 ==> 00:00:02.7600000 : (birds chirping)
7 |: N3 ]& b, H0 f$ ? - New Segment: 00:00:03.6600000 ==> 00:00:05.9000000 : (exhaling)
3 X. c8 S( a6 R6 H _$ i - New Segment: 00:00:05.9000000 ==> 00:00:08.6600000 : (birds chirping)' \7 w7 a w- n; l. @. Y% r
- New Segment: 00:00:08.6600000 ==> 00:00:35.1200000 : (gun firing)- d1 @% l2 N! T. u0 \2 G9 n
- New Segment: 00:00:36.1200000 ==> 00:00:38.5400000 : (gun firing)
8 {" U% n+ p: u/ \9 d - New Segment: 00:00:39.0600000 ==> 00:00:41.4800000 : (gun firing)' w, t! H/ f$ y+ w6 ?: Z
- New Segment: 00:00:41.4800000 ==> 00:00:49.4000000 : (tires screeching)7 {1 ]' Z; q$ }7 t
- New Segment: 00:00:49.4000000 ==> 00:00:58.5800000 : (glass shattering)# \9 ?& j8 d3 q
- New Segment: 00:00:58.5800000 ==> 00:01:07.7400000 : (singing in foreign language)
: s5 Z, ~8 j9 H l% c$ O6 |4 I) O - New Segment: 00:01:07.7400000 ==> 00:01:11.5800000 : (singing in foreign language)
# Y/ p6 y. V1 B8 A+ A! }8 d - New Segment: 00:01:11.5800000 ==> 00:01:17 : (tires screeching)+ p* h3 o7 G" I; d1 C
- New Segment: 00:01:17 ==> 00:01:24.8400000 : (singing in foreign language)
- L1 g q, {2 t! S O - New Segment: 00:01:24.8400000 ==> 00:01:28.6400000 : (panting)
) m7 N6 D/ O; `! }- f ^+ r - New Segment: 00:01:36.7800000 ==> 00:01:39.2000000 : (gun firing)
4 Q" M( B0 _$ r3 v \: s0 B: O - New Segment: 00:01:39.2000000 ==> 00:01:43.4600000 : - Adrian.' j' u, M/ y! r2 x, S- l% D
- New Segment: 00:01:43.4600000 ==> 00:01:45.6200000 : - Oh God.( ?. X: {2 c# S8 M7 X
- New Segment: 00:01:45.6200000 ==> 00:01:48.2000000 : - What's the matter sweetheart?) E% x+ Q* E6 p/ r
- New Segment: 00:01:48.2000000 ==> 00:01:50.4200000 : Oh.2 [ a! A- f% r
- New Segment: 00:01:50.4200000 ==> 00:01:53.4600000 : - Oh it's horrible., f8 W+ J- J( |- T/ q- T
- New Segment: 00:01:53.4600000 ==> 00:01:55.3000000 : - Shh.% k1 |3 \1 L; l) o$ _! I
- New Segment: 00:01:55.3000000 ==> 00:02:02.3400000 : It was just a bad dream.
' s. n4 Q- F' N6 C! g5 U1 c3 m - New Segment: 00:02:05.4200000 ==> 00:02:09.8800000 : - You don't ever have to be afraid of anything.6 W5 q! f1 @4 t0 j: \# s, Q2 E
- New Segment: 00:02:09.8800000 ==> 00:02:12.8000000 : I'll always be here to protect you.
; x8 }" z) o M - New Segment: 00:02:12.9200000 ==> 00:02:15.5000000 : (gentle music)5 S9 z7 b$ B, i. ~5 |, I
- New Segment: 00:02:16.4800000 ==> 00:02:19.0600000 : (gentle music)
, C( t7 Q Q" H% { - New Segment: 00:02:19.0600000 ==> 00:02:21.6400000 : (gentle music)& k) j, R# l i6 }' Q0 t B/ ~8 L
- New Segment: 00:02:21.6400000 ==> 00:02:24.2200000 : (gentle music)
/ m" o+ _- H: O0 y( `, r - New Segment: 00:02:24.5400000 ==> 00:02:27.1200000 : (gentle music)% s+ R% V# @& k0 z
- New Segment: 00:02:27.1200000 ==> 00:02:29.7000000 : (gentle music)7 s7 W) |$ e7 D( G0 P3 |" @
- New Segment: 00:02:29.7000000 ==> 00:02:33.1800000 : [Music]
/ u4 Q" o: U' x5 { -
复制代码
- ?7 S7 g* T5 P2 N, S( \& S* `& a1 Z, j9 i" [
|