Category Archives: Windows

Windows psql and utf8 client_encoding issues

Prior to pg 9.1, you could connect to any database with psql regardless of encoding and you’d get the server encoding as your client encoding unless you set it:

C:\>chcp 1252
Active code page: 1252

C:\>"C:\Program Files\PostgreSQL\9.0\bin\psql" -U glyn -d TEST -h pgtest
psql (9.0.22, server 9.4.4)
WARNING: psql version 9.0, server version 9.4.
         Some psql features might not work.
SSL connection (cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256)
Type "help" for help.

TEST=> show client_encoding;
 client_encoding
-----------------
 LATIN1
(1 row)

That wasn’t quite right; the client_encoding is a lie. On a modern psql version that’s quite rightly prevented:

C:\>"C:\Program Files (x86)\pgAdmin III\1.20\psql" -U glyn -d TEST -h pgtest
psql: FATAL:  conversion between WIN1252 and LATIN1 is not supported

This is not an issue if you want to connect to a utf8 database, but the issue I had this morning was connecting to a latin1 database with psql from a Windows client (something I do rarely). If I set the codepage to utf8 to match client encoding, I got a “Not enough memory.” error:

C:\>chcp 65001
Active code page: 65001

C:\>set PGCLIENTENCODING=UTF8

C:\>"C:\Program Files (x86)\pgAdmin III\1.20\psql" -U glyn -d TEST -h pgtest
psql (9.4.0, server 9.4.4)
WARNING: Console code page (65001) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 25
6, compression: off)
Type "help" for help.

TEST=> show client_encoding;
Not enough memory.

I could set the codepage to 1252, but that would mean my setting for client_encoding would be a lie, and if I were to then revert to set client_encoding=’WIN1252′ I’d have come full circle and be back at the “FATAL: conversion between WIN1252 and LATIN1 is not supported” error message.

A quick google revealed these bug reports with no solutions. Another dig at the docs revealed the following passage:

pager

Controls use of a pager program for query and psql help output. If the environment variable PAGER is set, the output is piped to the specified program. Otherwise a platform-dependent default (such as more) is used.

So how does more behave?

C:\>chcp 65001
Active code page: 65001

C:\>more
Not enough memory.

Bingo! So if I turn the pager off the error should go:

C:\>chcp 65001
Active code page: 65001

C:\>set PGCLIENTENCODING=UTF8
C:\>"C:\Program Files (x86)\pgAdmin III\1.20\psql" -U glyn -d TEST -h pgtest
WARNING: Console code page (65001) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 25
6, compression: off)
Type "help" for help.

TEST=> \pset pager off
Pager usage is off.
TEST=> show client_encoding;
 client_encoding
-----------------
 UTF8
(1 row)

Cluprit found; quite embarrassing that the source of such a verbose error as “Not enough memory.” being Microsoft didn’t occour sooner. So lets try a different pager (sourced from)

C:\>chcp 65001
Active code page: 65001

C:\>set PGCLIENTENCODING=UTF8
C:\>set PAGER="C:\Program Files (x86)\gnuwin32\bin\less"
C:\>set LESS=--quit-at-eof

C:\>"C:\Program Files (x86)\pgAdmin III\1.20\psql" -U glyn -d TEST -h pgtest
psql (9.4.0, server 9.4.4)
WARNING: Console code page (65001) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 25
6, compression: off)
Type "help" for help.

TEST=> show client_encoding;

 client_encoding
-----------------
 UTF8
(1 row)

(END)

PostgreSQL Latin1 database with ASP.Net Unicode and DOS codepage 850 applications

So the title sums it up without going into detail; we basically have a legacy DOS application that had its underlying dataflies migrated into a latin1 encoded PostgreSQL database with no form of intelligent mapping (i.e. the cp850 “ú” character which is codepoint 163 is seen as the character “£” in latin1).

Converting to UTF8 at this point would have saved lots of hassle, however at the time of conversion the existing data wouldn’t easily go into utf8 without further intermediate conversion. Even then UTF8 wouldn’t be a magic bullet; we’d still have to sanitize the incoming data so we didn’t store anything the DOS side (or any other equipment) couldn’t read.

Fast forward a few years and now there’s two main apps hitting the database; an ASP.Net application and the aforementioned legacy DOS system. Both are heavily used and the DOS system isn’t going anywhere so we need a way for both systems to get along; crucially the DOS application must be able to understand everything the ASP.Net application puts in the database, and the ASP.Net application must be able to display the DOS codepage correctly in UTF8.

So how do we do this? All my attempts using the PostgreSQL encode/decode functions failed due to there being no cp850 representation, and although I’m sure it’s possible to do the conversion in sql using something like translate / convert and a large mapping string I don’t have time to fiddle with it right now. However perls encode and decode functions do have the ability to convert from cp850, so one solution from within PostgreSQL is to write a set of untrusted pl/perl functions like:

[postgres]
CREATE OR REPLACE FUNCTION cp850_to_utf8(text)
RETURNS text AS
$BODY$
use Encode;
return encode( ‘iso-8859-1’, decode(‘cp850’, $_[0] ));
$BODY$
LANGUAGE ‘plperlu’ IMMUTABLE;

CREATE OR REPLACE FUNCTION utf8_to_cp850(text)
RETURNS text AS
$BODY$
use Encode;
return encode( ‘cp850’, decode(‘iso-8859-1’, $_[0] ));
$BODY$
LANGUAGE ‘plperlu’ IMMUTABLE;

[/postgres]

For my current problem this means if either application writes something like “Straße” the other can read back the same representation.

From the ASP.Net side we can do the conversion outside the database with a similar process. Converting text to go into the database involves switching the encoding first from unicode into cp850 then back into latin1 to get the correct representation on the DOS side:

private static Encoding _eCp850 = Encoding.GetEncoding(850);
private static Encoding _eUnicode = Encoding.UTF8;
private static Encoding _eLatin1 = Encoding.GetEncoding("ISO-8859-1");

public static string encode_cp850(string sText)
{
	string sReturn;
	byte[] bSource;
	byte[] bTarget;

	bSource = _eUnicode.GetBytes(sText);
	bTarget = Encoding.Convert(_eUnicode, _eCp850, bSource);
	sReturn = _eLatin1.GetString(bTarget);

	return sReturn;
}

Reading is a little trickier though, as all strings in .Net are UTF8 (and actually stored as UTF16 internally) so once our string is read back into a string we can’t easily do our conversion as our initial representation has been mangled. However if we read our string back from the database as bytea into a byte array we can successfully convert it into the correct UTF8 representation:

public static string decode_cp850(byte[] sTextAsBytea)
{
	string sReturn;
	byte[] bSource = sTextAsBytea;
	byte[] bTarget;

	bTarget = Encoding.Convert(_eCp850, _eUnicode, bSource);
	sReturn = _eUnicode.GetString(bTarget);

	return sReturn;
}
        

I’m still wondering if there is a better way (I know, I know, yes it’s to have everything in UTF8), but for now this seems to manage ok for us.

I’ve uploaded some scripts to my git repository at postgresql/encoding_woes (mirrored also on github glynastill/encoding_woes)

Uploading old stuff – DBLT

I’ve recently been digging through a bunch of old test programs and scripts I’ve written over the years, and thought I’d upload some of them incase they can be of any use to anyone. I thought I’d start off with a console app I’d named DBLT.

DBLT is simply the abbreviation for “database load test” that jumped into my head when faced with the Visual Studio “New Project” dialogue. I wanted to perform some load tests in a way that more closely represented a specific application and measured the things I was interested in. I don’t class it as a polished testing platform, and I take the results of my tests with a pinch of salt; as I assume some of my techniques in running the tests and locking when maintaining the statistics will in some way have skewed them, but hopefully only slightly.

The test is written in C#, and uses the npgsql data provider to connect to PostgreSQL. As I mentioned initially, the main reason for this is I wanted to test different configurations in connecting to our databases in a similar way to how a specific application did. Mainly this was to test the performance of npgsqls local connection pool logic, and other associated parameters, but also to test other parameters on our pgBouncer pools, PostgreSQL and Linux hosts.

My other reason for writing my own test is that pgbench outputs results based around TPS, but I wanted a slightly different take on the statistics. I wanted to know things like how long my queries were taking on average, what the worst and best execution times were and the standard deviation of execution time to see that we were providing a good quality of service to all clients rather than just shunting the most TPS through the database. If you’re now thinking “the fool – he should have just run pgbench and run an analysis on his logs”, you’re quite right; but what that wouldn’t have given me an overall view of the behaviour of the system when we start including things like cascaded connection pools (local npgsql, and pgbouncer), networking and local client machine configuration. I could have analyzed some of this using tools like tsung, but had issues generating heavy load with tsung and its output didn’t give me exactly what I wanted.

I wrote dblt back in 2009, and I’ve found it useful on quite a few occasions but never thought to share it before. I’ve recently signed up to github so the source can be obtained from glynastill/dblt or my git repository at postgresql/dblt. I’ve also uploaded a compiled binary here

dblt.

Hopefully someone finds it useful or highlights failings that can be fixed. Even if you don’t want to test against PostgreSQL it could quite easily be used to test sql server or other systems by changing the data provider.