Opened 5 years ago

Closed 4 years ago

#666 closed defect (fixed)

problem with opening non unicode subtitle file

Reported by: Spockie Owned by: verm
Priority: normal Milestone: 3.0.0
Component: Subtitle Version: 2.1.6
Severity: major Keywords: universalchardet
Cc: Platform: All
Sub Component:

Description

subtitle file contains text in codepage win-1250

default charset (in Windows) is set to win-1250

when opening that subtitle file with Aegisub r1847 it shows message that it could not narrow character set to a single one, and offer several character sets to choose from (win-1250 is not among them)

working workaround:
before opening in Aegisub convert subtitle file from win-1250 to utf-8 in whatever text editor capable of doing it (e.g. notepad, PSPad, ...)

btw: Aegisub v1.10 works fine

  • probably just doing conversion "default windows charset" -> utf-8
  • which is with case of subtitle file with "default windows charset" perfectly fine

proposed solution:
a) add win-1250 to supported charsets
b) "automagically" add "default windows charset" to supported charsets

ADDITIONAL INFORMATION:
The above information is slightly outdated, see notes. New info (copied from #806):
Opening via File -> "Open Subtitles with Charset..." and choosing charset WINDOWS-1250 works.

When opening normally it lists several badly detected charsets with certainty score and "Unknown - windows-1250 (local)". However when I choose that line with windows-1250, it doesn't work correctly. In some cases all non ascii characters get converted to some nonsense. It seems to me that even if I choose windows-1250, the converting function is called with "source charset" parameter other than windows-1250.

In at least one case I get two error messages "Cannot convert from the charset 'IBM866'!". When opening that file, IBM866 is listed among detected charsets but not with highset score.

Attachments (1)

Dragon_Ball_-_075_-_The_Strong_Ones_-_[aX](cdf05199).CZ.7z (4.0 KB) - added by Spockie 5 years ago.

Download all attachments as: .zip

Change History (19)

comment:1 Changed 5 years ago by ArchMageZeratuL

  • Keywords acknowledged added

I can't add charset support, it's a wx thing. I guess that I could make it list default, but it won't be able to give you a certainty score for it.

Either way, you should be able to open that file via File -> "Open Subtitles with Charset...", assuming that we DO have support for Win-1250

comment:2 Changed 5 years ago by Spockie

Opening via File -> "Open Subtitles with Charset..." and choosing charset WINDOWS-1250 works.

Listing default in charset offering even without certainty score would be nice. Otherwise I must either do conversion to UTF-8 before opening in Aegisub, or open it from inside of Aegisub (opening through just double clicking thanks to associaton is currently impossible).

Besides offering current windows default charset (even without certainty score) could be helpful something like button "I wanna choose charset from all supported regardless autodetected possible charsets".

comment:3 Changed 5 years ago by ArchMageZeratuL

  • Keywords resolved added; acknowledged removed
  • Owner set to ArchMageZeratuL
  • Resolution set to fixed
  • Status changed from new to closed

Hopefully resolved.

comment:4 Changed 5 years ago by Spockie

  • Keywords feedback added; resolved removed
  • Resolution fixed deleted
  • Status changed from closed to assigned

At Aegisub 2.1.2 it works differently - I suppose that's the change you made.

Opening via File -> "Open Subtitles with Charset..." and choosing charset WINDOWS-1250 works.

When opening normally it lists several badly detected charsets with certainty score and "Unknown - windows-1250 (local)". However when I choose that line with windows-1250, it doesn't work correctly. In some cases all non ascii characters get converted to some nonsense.

In at least one case I get two error messages "Cannot convert from the charset 'IBM866'!". When opening that file, IBM866 is listed among detected charsets but not with highset score.

comment:5 Changed 5 years ago by Spockie

I uploaded that file that generates these error messages - Dragon_Ball_-_075_-_The_Strong_Ones_-_[aX](cdf05199).CZ.ssa (compressed by 7-zip).

comment:6 Changed 4 years ago by TheFluff

  • Keywords confirmed added; feedback removed
  • Status changed from assigned to new

comment:7 Changed 4 years ago by TheFluff

  • Milestone changed from 2.1.0 to 2.1.6
  • Severity changed from minor to major

comment:8 Changed 4 years ago by TheFluff

Updated with info from #806.

comment:9 Changed 4 years ago by nielsm

  • Milestone changed from 2.1.6 to 2.2.0
  • Platform set to All

comment:10 Changed 4 years ago by nielsm

It's probably the "local" option in the detected charsets list that doesn't work as advertised.

comment:11 Changed 4 years ago by nielsm

  • Owner changed from ArchMageZeratuL to nielsm
  • Status changed from new to accepted

comment:12 Changed 4 years ago by nielsm

  • Keywords universalchardet added; confirmed removed
  • Owner nielsm deleted
  • Status changed from accepted to assigned

I think this is mostly about univ chardet that fails, giving it to verm.
If 'local' encoding fails to open a file that really is in the local encoding, there isn't much we can do. The encoding conversion isn't something we do, that's system libraries.

comment:13 Changed 4 years ago by nielsm

I said "giving it to verm"... okay?

comment:14 Changed 4 years ago by nielsm

  • Status changed from assigned to new

comment:15 Changed 4 years ago by verm

  • Owner set to verm
  • Status changed from new to accepted

comment:16 Changed 4 years ago by verm

  • Milestone changed from 2.1.7 to 2.2.0

Move ticket to milestone:2.2.0 as milestone:2.1.7 became a windows-only maintenance release.

comment:17 Changed 4 years ago by Plorkyeran

  • Resolution set to fixed
  • Status changed from accepted to closed

Fixed in [3137]. See #877.

comment:18 Changed 3 years ago by verm

  • Milestone changed from 2.2.0 to 3.0.0

Bump 2.2.0 tickets to milestone:3.0.0 (2.2.0 is becoming 3.0.0)

Note: See TracTickets for help on using tickets.