Opened 5 years ago
Closed 4 years ago
#666 closed defect (fixed)
problem with opening non unicode subtitle file
| Reported by: | Spockie | Owned by: | verm |
|---|---|---|---|
| Priority: | normal | Milestone: | 3.0.0 |
| Component: | Subtitle | Version: | 2.1.6 |
| Severity: | major | Keywords: | universalchardet |
| Cc: | Platform: | All | |
| Sub Component: |
Description
subtitle file contains text in codepage win-1250
default charset (in Windows) is set to win-1250
when opening that subtitle file with Aegisub r1847 it shows message that it could not narrow character set to a single one, and offer several character sets to choose from (win-1250 is not among them)
working workaround:
before opening in Aegisub convert subtitle file from win-1250 to utf-8 in whatever text editor capable of doing it (e.g. notepad, PSPad, ...)
btw: Aegisub v1.10 works fine
- probably just doing conversion "default windows charset" -> utf-8
- which is with case of subtitle file with "default windows charset" perfectly fine
proposed solution:
a) add win-1250 to supported charsets
b) "automagically" add "default windows charset" to supported charsets
ADDITIONAL INFORMATION:
The above information is slightly outdated, see notes. New info (copied from #806):
Opening via File -> "Open Subtitles with Charset..." and choosing charset WINDOWS-1250 works.
When opening normally it lists several badly detected charsets with certainty score and "Unknown - windows-1250 (local)". However when I choose that line with windows-1250, it doesn't work correctly. In some cases all non ascii characters get converted to some nonsense. It seems to me that even if I choose windows-1250, the converting function is called with "source charset" parameter other than windows-1250.
In at least one case I get two error messages "Cannot convert from the charset 'IBM866'!". When opening that file, IBM866 is listed among detected charsets but not with highset score.
Attachments (1)
Change History (19)
comment:1 Changed 5 years ago by ArchMageZeratuL
- Keywords acknowledged added
comment:2 Changed 5 years ago by Spockie
Opening via File -> "Open Subtitles with Charset..." and choosing charset WINDOWS-1250 works.
Listing default in charset offering even without certainty score would be nice. Otherwise I must either do conversion to UTF-8 before opening in Aegisub, or open it from inside of Aegisub (opening through just double clicking thanks to associaton is currently impossible).
Besides offering current windows default charset (even without certainty score) could be helpful something like button "I wanna choose charset from all supported regardless autodetected possible charsets".
comment:3 Changed 5 years ago by ArchMageZeratuL
- Keywords resolved added; acknowledged removed
- Owner set to ArchMageZeratuL
- Resolution set to fixed
- Status changed from new to closed
Hopefully resolved.
comment:4 Changed 5 years ago by Spockie
- Keywords feedback added; resolved removed
- Resolution fixed deleted
- Status changed from closed to assigned
At Aegisub 2.1.2 it works differently - I suppose that's the change you made.
Opening via File -> "Open Subtitles with Charset..." and choosing charset WINDOWS-1250 works.
When opening normally it lists several badly detected charsets with certainty score and "Unknown - windows-1250 (local)". However when I choose that line with windows-1250, it doesn't work correctly. In some cases all non ascii characters get converted to some nonsense.
In at least one case I get two error messages "Cannot convert from the charset 'IBM866'!". When opening that file, IBM866 is listed among detected charsets but not with highset score.
Changed 5 years ago by Spockie
comment:5 Changed 5 years ago by Spockie
I uploaded that file that generates these error messages - Dragon_Ball_-_075_-_The_Strong_Ones_-_[aX](cdf05199).CZ.ssa (compressed by 7-zip).
comment:6 Changed 4 years ago by TheFluff
- Keywords confirmed added; feedback removed
- Status changed from assigned to new
comment:7 Changed 4 years ago by TheFluff
- Milestone changed from 2.1.0 to 2.1.6
- Severity changed from minor to major
comment:8 Changed 4 years ago by TheFluff
Updated with info from #806.
comment:9 Changed 4 years ago by nielsm
- Milestone changed from 2.1.6 to 2.2.0
- Platform set to All
comment:10 Changed 4 years ago by nielsm
It's probably the "local" option in the detected charsets list that doesn't work as advertised.
comment:11 Changed 4 years ago by nielsm
- Owner changed from ArchMageZeratuL to nielsm
- Status changed from new to accepted
comment:12 Changed 4 years ago by nielsm
- Keywords universalchardet added; confirmed removed
- Owner nielsm deleted
- Status changed from accepted to assigned
I think this is mostly about univ chardet that fails, giving it to verm.
If 'local' encoding fails to open a file that really is in the local encoding, there isn't much we can do. The encoding conversion isn't something we do, that's system libraries.
comment:13 Changed 4 years ago by nielsm
I said "giving it to verm"... okay?
comment:14 Changed 4 years ago by nielsm
- Status changed from assigned to new
comment:15 Changed 4 years ago by verm
- Owner set to verm
- Status changed from new to accepted
comment:16 Changed 4 years ago by verm
- Milestone changed from 2.1.7 to 2.2.0
Move ticket to milestone:2.2.0 as milestone:2.1.7 became a windows-only maintenance release.
comment:17 Changed 4 years ago by Plorkyeran
- Resolution set to fixed
- Status changed from accepted to closed
comment:18 Changed 3 years ago by verm
- Milestone changed from 2.2.0 to 3.0.0
Bump 2.2.0 tickets to milestone:3.0.0 (2.2.0 is becoming 3.0.0)

I can't add charset support, it's a wx thing. I guess that I could make it list default, but it won't be able to give you a certainty score for it.
Either way, you should be able to open that file via File -> "Open Subtitles with Charset...", assuming that we DO have support for Win-1250