character encoding

asked 2015-04-16 10:57:45 +0200

czsan
237 ●9 ●16 ●26

How can I chgange characte encoding in Sage notebook. I'm hungarian, and in string I need characterd like á, ő, ű, etc, but not \xc3, \xc5 and so on.

edit retag flag offensive close merge delete

Comments

See also the presumably identical http://ask.sagemath.org/question/8249...

kcrisman ( 2015-04-16 16:48:15 +0200 )edit

kcrisman ( 2015-04-28 20:29:22 +0200 )edit

add a comment

1

answered 2015-04-16 16:53:19 +0200

kcrisman
12252 ●42 ●136 ●255

Apparently for now you may have to use the print command.

print u"gömböc"

works, indeed even without the u. See e.g. http://stackoverflow.com/questions/10...

edit flag offensive delete link

Comments

1

You are right, print is the key to printing a string properly. Prefixing u to the string changes how it is encoded, which might matter for the subsequent use of the string.

slelievre ( 2015-04-16 19:08:42 +0200 )edit

add a comment

0

answered 2023-09-14 09:57:23 +0200

lauryfriese
1 ●1

updated 2023-09-15 03:41:02 +0200

To change the character encoding in a Sage notebook or any Python environment, you typically need to ensure that your source code files are saved with the correct encoding and that you handle string literals properly. Here are a few steps you can follow :

Save your source code files with the correct encoding: Use a text editor that allows you to specify the encoding when saving your Python files. Choose an encoding that supports the Hungarian characters you need, such as UTF-8. UTF-8 is a widely used encoding that can handle a wide range of characters.
Specify the encoding in your Python script: At the top of your Python script or notebook, include a comment indicating the encoding of your source code. This is known as a "coding declaration" and helps the Python interpreter correctly interpret the characters in your script. For example, if you are using UTF-8 encoding, your coding declaration would be:

```python

```

Use Unicode escape sequences or raw strings: To include special characters directly in your string literals, you can use Unicode escape sequences or raw strings. Unicode escape sequences allow you to represent characters using their Unicode code points. For example, to represent the character "á", you can use the escape sequence "\u00E1". Here's an example:

python text = "This is an example with special characters: \u00E1, \u0151, \u0171" print(text)

Alternatively, you can use raw strings by prefixing your string literals with the letter "r". Raw strings ignore escape sequences and treat backslashes literally. Here's an example:

python text = r"This is an example with special characters: \u00E1, \u0151, \u0171" print(text)

By using these techniques, you can ensure that your Hungarian characters are properly represented in your Sage notebook or Python code, allowing you to work with them directly without relying on escape sequences like \xc3 or \xc5.

edit flag offensive delete link

add a comment

0

answered 2015-04-16 12:14:31 +0200

slelievre
17839 ●22 ●164 ●354 http://carva.org/samue...

updated 2015-04-17 13:27:53 +0200

EDITED. My original answer

You can prefix a string with the letter u to mark it as a unicode string, eg u'gömböc'.

was not so helpful. The notes below are maybe more a related discussion than a proper answer.

You can prefix a string with the letter u to mark it as a unicode string. If you are inputting unicode characters, this will affect how the string is encoded.

Here is what I get in the Sage REPL.

sage: 'Erdős'
'Erd\xc5\x91s'
sage: u'Erdős'
u'Erd\u0151s'

This shows a difference in the escape codes used for accented characters.

Apparently @kcrisman's indication to use print is the key to properly displaying unicode strings.

sage: print 'Erdős'
Erdős
sage: print u'Erdős'
Erdős

The role of the u prefix is not so apparent here.

The u is useful if you are using unicode escape codes in the string.

sage: print 'Erd\u0151s'
Erd\u0151s
sage: print u'Erd\u0151s'
Erdős

The other version:

sage: print 'Erd\xc5\x91s'
Erdős

edit flag offensive delete link

Comments

The respone is u'g\xf6mb\xf6c'

czsan ( 2015-04-16 12:26:16 +0200 )edit

add a comment

0

answered 2015-08-07 09:24:42 +0200

yanmercal
1

A character encoding tells the computer how to interpret raw zeroes and ones into real characters. It usually does this by pairing numbers with characters. Words and sentences in text are created from characters and these characters are grouped into a character set. There are many different types of character encodings floating around at present, but the ones we deal most frequently with are ASCII, 8-bit encodings, and Unicode-based encodings. More about.....Character Encoding

Mercal

edit flag offensive delete link

add a comment

character encoding

Comments

4 Answers

Comments

Comments

Your Answer

Question Tools

Stats

Related questions

character encoding edit

Comments

4 Answers

Comments

Comments

Your Answer

Question Tools

Stats

Related questions

character encoding