Ask Your Question
2

character encoding

asked 2015-04-16 10:57:45 +0100

czsan gravatar image

How can I chgange characte encoding in Sage notebook. I'm hungarian, and in string I need characterd like á, ő, ű, etc, but not \xc3, \xc5 and so on.

edit retag flag offensive close merge delete

Comments

See also the presumably identical http://ask.sagemath.org/question/8249...

kcrisman gravatar imagekcrisman ( 2015-04-16 16:48:15 +0100 )edit
kcrisman gravatar imagekcrisman ( 2015-04-28 20:29:22 +0100 )edit

4 Answers

Sort by » oldest newest most voted
1

answered 2015-04-16 16:53:19 +0100

kcrisman gravatar image

Apparently for now you may have to use the print command.

print u"gömböc"

works, indeed even without the u. See e.g. http://stackoverflow.com/questions/10...

edit flag offensive delete link more

Comments

1

You are right, print is the key to printing a string properly. Prefixing u to the string changes how it is encoded, which might matter for the subsequent use of the string.

slelievre gravatar imageslelievre ( 2015-04-16 19:08:42 +0100 )edit
0

answered 2023-09-14 09:57:23 +0100

lauryfriese gravatar image

updated 2023-09-15 03:41:02 +0100

To change the character encoding in a Sage notebook or any Python environment, you typically need to ensure that your source code files are saved with the correct encoding and that you handle string literals properly. Here are a few steps you can follow :

  1. Save your source code files with the correct encoding: Use a text editor that allows you to specify the encoding when saving your Python files. Choose an encoding that supports the Hungarian characters you need, such as UTF-8. UTF-8 is a widely used encoding that can handle a wide range of characters.
  2. Specify the encoding in your Python script: At the top of your Python script or notebook, include a comment indicating the encoding of your source code. This is known as a "coding declaration" and helps the Python interpreter correctly interpret the characters in your script. For example, if you are using UTF-8 encoding, your coding declaration would be:

```python

```

  1. Use Unicode escape sequences or raw strings: To include special characters directly in your string literals, you can use Unicode escape sequences or raw strings. Unicode escape sequences allow you to represent characters using their Unicode code points. For example, to represent the character "á", you can use the escape sequence "\u00E1". Here's an example:

python text = "This is an example with special characters: \u00E1, \u0151, \u0171" print(text)

Alternatively, you can use raw strings by prefixing your string literals with the letter "r". Raw strings ignore escape sequences and treat backslashes literally. Here's an example:

python text = r"This is an example with special characters: \u00E1, \u0151, \u0171" print(text)

By using these techniques, you can ensure that your Hungarian characters are properly represented in your Sage notebook or Python code, allowing you to work with them directly without relying on escape sequences like \xc3 or \xc5.

edit flag offensive delete link more
0

answered 2015-04-16 12:14:31 +0100

slelievre gravatar image

updated 2015-04-17 13:27:53 +0100

EDITED. My original answer

You can prefix a string with the letter u to mark it as a unicode string, eg u'gömböc'.

was not so helpful. The notes below are maybe more a related discussion than a proper answer.

You can prefix a string with the letter u to mark it as a unicode string. If you are inputting unicode characters, this will affect how the string is encoded.

Here is what I get in the Sage REPL.

sage: 'Erdős'
'Erd\xc5\x91s'
sage: u'Erdős'
u'Erd\u0151s'

This shows a difference in the escape codes used for accented characters.

Apparently @kcrisman's indication to use print is the key to properly displaying unicode strings.

sage: print 'Erdős'
Erdős
sage: print u'Erdős'
Erdős

The role of the u prefix is not so apparent here.

The u is useful if you are using unicode escape codes in the string.

sage: print 'Erd\u0151s'
Erd\u0151s
sage: print u'Erd\u0151s'
Erdős

The other version:

sage: print 'Erd\xc5\x91s'
Erdős
edit flag offensive delete link more

Comments

The respone is u'g\xf6mb\xf6c'

czsan gravatar imageczsan ( 2015-04-16 12:26:16 +0100 )edit
0

answered 2015-08-07 09:24:42 +0100

A character encoding tells the computer how to interpret raw zeroes and ones into real characters. It usually does this by pairing numbers with characters. Words and sentences in text are created from characters and these characters are grouped into a character set. There are many different types of character encodings floating around at present, but the ones we deal most frequently with are ASCII, 8-bit encodings, and Unicode-based encodings. More about.....Character Encoding

Mercal

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2015-04-16 10:57:45 +0100

Seen: 2,276 times

Last updated: Sep 15 '23