Two kinds of text???

Post questions on how to use or achieve an effect in Inkscape.
mathog
Posts: 82
Joined: Tue Feb 08, 2011 6:05 am

Two kinds of text???

Postby mathog » Thu Oct 27, 2011 3:14 am

I have been trying to import some HTML pages which are mostly text into Powerpoint 2003 presentations - keeping the text as text. A little bit of editing is needed too.

The obvious method to get it into PPT does not work: open the page in a browser, select all, copy, open a slide in PPT, paste - result is an unformatted pile of text. Open the saved html in Word, select all, copy, paste into PPT, again, the same pile of unformatted text.

This version of PPT will accept (in theory) vector input in WMF or EMF formats. (Also CGM, which is supposedly supported by uniconverter, but does not appear as an option in 0.48 Inkscape on Windoews.)

So open page in a browser, print to pdf.

Opened the PDF in Inkscape which works fairly well. Can edit the resulting objects and text _seems_ to be text. Can click the "A" icon and select a few words by sliding over them. HOWEVER, save as WMF or EMF format, and Insert -> Picture -> From file (use EMF one) in PPT, and the text which was imported from PDF is always converted into drawn objects. Each letter becomes its own drawn object. I think that is probably an issue with uniconverter. A further description of what happens is here:

viewtopic.php?f=5&t=10521

Strangely, if in the same inkscape document, after the PDF is imported, a line of text like "This is normal text" is created using the "Create and Edit Text Objects" button, and it is exported along with the imported PDF text, that line is imported into PPT correctly. So Inkscape seems to have two types of text, and the imported PDF text is somehow cursed as far as export to EMF/WMF is concerned.

Can somebody please explain how one can distinguish between these two types of text, and tell me if there is a way to convert the cursed
text into normal text? In the example attached all of the text is cursed except for the line "This is some text entered as a text box".
Attachments
saf_test.svg
example svg with cursed and normal text
(81.36 KiB) Downloaded 156 times

mathog
Posts: 82
Joined: Tue Feb 08, 2011 6:05 am

Re: Two kinds of text???

Postby mathog » Thu Oct 27, 2011 3:17 am

I should add, that during the Save as... (to EMF/WMF) is performed a dialog pops up:

EMF convert
Convert texts to paths [ ]
[Cancel] [OK]

I do NOT check that box, and just click OK.

mathog
Posts: 82
Joined: Tue Feb 08, 2011 6:05 am

Re: Two kinds of text???

Postby mathog » Thu Oct 27, 2011 3:52 am

I edited the example down to just two lines, one of each type. The text that stays as text when saved as an EMF is in a <flowRegion> <flowPara> section, whereas the cursed text (from the PDF, converts to drawn objects) is not. Here is the code from the SVG. Simplified file is attached.

Is there some simple way to convert the latter type to the former?

Code: Select all

   transform="matrix(0.8,0,0,-0.8,551.51097,962.3509)"><flowRegion
     id="flowRegion3832"><rect
       id="rect3834"
       width="183.65956"
       height="77.190247"
       x="-334.04745"
       y="464.30777" /></flowRegion><flowPara
     id="flowPara3836">This is some text entered as a text box</flowPara></flowRoot><text
   transform="scale(1,-1)"
   id="text3050"
   x="268.96753"
   y="-651.39398"><tspan
     style="font-size:8.76000023px;font-variant:normal;font-weight:normal;font-stretch:normal;writing-mode:lr-tb;fill:#000000;fill-opacity:1;fill-rule:nonzero;stroke:none;font-family:Courier New;-inkscape-font-specification:CourierNew"
     x="268.96753 274.36633 279.76511 285.28391 290.68271 296.08148 301.48029 306.99908 312.39786 317.79666 323.19543 328.59424 334.11304 339.51181 344.91061 350.30939 355.70819 361.22699 366.62576 372.02457 377.42334 382.94214 388.34094 393.73972 399.13852 404.53729 410.05609 415.4549"
     y="-651.39398"
     sodipodi:role="line"
     id="tspan3052">), Biology Division, Caltech</tspan></text>
Attachments
saf_test.svg
simplified example
(7.32 KiB) Downloaded 254 times

mathog
Posts: 82
Joined: Tue Feb 08, 2011 6:05 am

Re: Two kinds of text???

Postby mathog » Thu Oct 27, 2011 4:12 am

Hmm. Flowed seems to be a red herring. Added one line more of text by click on 'A' icon, click on document, and just start typing. That one
is also successfully moved via EMF into PPT, but it has no flow attributes.

Aha, the "x" value for the ones that import as text have a single value, whereas the "x'" value for the ones that break up on importing
are a list - looks like the PDF specified where every letter goes and that was imported into the SVG. That is:

x="162.36569"
vs.
x="268.96753 274.36633 279.76511 285.28391 290.68271 296.08148 301.48029 306.99908 312.39786 317.79666 323.19543 328.59424 334.11304 339.51181 344.91061 350.30939 355.70819 361.22699 366.62576 372.02457 377.42334 382.94214 388.34094 393.73972 399.13852 404.53729 410.05609 415.4549"

Do the experiment. With an editor, eliminate all but the first entry in that long list. Open that in inkscape, save as EMF, insert into PPT. Good, it stayed text! That can now be selected and edited normally within PPT.

Now, is there some way to convert text with lists of X to just the first X, without resorting to editing the SVG file???

Code: Select all

<text
   xml:space="preserve"
   style="font-size:16px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:start;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;font-family:Arial;-inkscape-font-specification:Arial"
   x="162.36569"
   y="500.24118"
   id="text3038"
   sodipodi:linespacing="125%"
   transform="matrix(0.8,0,0,-0.8,0,792)"><tspan
     sodipodi:role="line"
     id="tspan3040"
     x="162.36569"
     y="500.24118">This text was not dragged.</tspan></text>
Attachments
saf_test.svg
modified again, x list reduced to x one value
(7.63 KiB) Downloaded 249 times

~suv
Posts: 2272
Joined: Sun May 10, 2009 2:07 am

Re: Two kinds of text???

Postby ~suv » Thu Oct 27, 2011 11:07 am

mathog wrote:So open page in a browser, print to pdf.

Opened the PDF in Inkscape which works fairly well. Can edit the resulting objects and text _seems_ to be text. Can click the "A" icon and select a few words by sliding over them.
mathog wrote:Strangely, if in the same inkscape document, after the PDF is imported, a line of text like "This is normal text" is created using the "Create and Edit Text Objects" button, and it is exported along with the imported PDF text, that line is imported into PPT correctly. So Inkscape seems to have two types of text, and the imported PDF text is somehow cursed as far as export to EMF/WMF is concerned.
These are not really "two kinds of text": the difference is that the text resulting from importing a PDF file in Inkscape is absolutely kerned (each letter is absolutely positioned on the page).

Please read the notes on 'Importing PDF > Editing text' in in the release notes of Inskcape 0.46 which explains why (this also applies to current version 0.48.2):
PDF and AI import > Text editing tips wrote:Text editing tips: Any text imported from PDF or AI has each letter's precise place on the page fixed. While this preserves the exact appearance (e.g. justification of text blocks) of the imported document, it makes editing such text difficult: deleting text fails to contract the text line and inserting text fails to expand it, i.e. typed letters overlay the existing letters. (However, you still can replace a letter with another letter of about the same width, although you may need to kern it into place with Alt+arrows.)

To work around this, select the text object you want to edit and use Text > Remove manual kerns command. This will remove the exact positioning information, so if the text block was justified it will lose justification, but instead you will be able to edit it as usual.

Note that there is a way to select even a single line in a text block. For this, open the XML editor, expand the <svg:text> tree branch corresponding to your text, and select any of the <svg:tspan> objects under it. Now you can remove manual kerns from this line only. After you finish editing the line, you can manually justify it back, for example by adding spaces, manual kerns (Alt+arrows), or by adjusting letterspacing (select the whole line and use Alt+> or Alt+<).

The native PDF/AI importer is based on the poppler library and was implemented by Miklós Erdélyi as part of the Google Summer of Code 2007.

mathog
Posts: 82
Joined: Tue Feb 08, 2011 6:05 am

Re: Two kinds of text???

Postby mathog » Fri Oct 28, 2011 7:33 am

Right, manual kerning. Unfortunately,

select all
text -> remove manual kerning

massacres the layout of the text. Bizarrely, some of the text moves vertically, and others move to odd positions horizontally.. Undo "remove manual kerns" and the page is NOT restored.

See the attached images for all of these effects. Hmm, they are showing up in the wrong order. The order should be before.png, unkerned.png, and then undo_unkern.png.
Attachments
undo_unkern.png
Undo the remove manual kerns - result is a big mess.
undo_unkern.png (76.64 KiB) Viewed 2006 times
unkerned.png
After text -> remove manual kerns
unkerned.png (77.24 KiB) Viewed 2006 times
before.png
Imported from PDF, edited down to just a couple of lines
before.png (77.33 KiB) Viewed 2006 times

mathog
Posts: 82
Joined: Tue Feb 08, 2011 6:05 am

Re: Two kinds of text???

Postby mathog » Fri Oct 28, 2011 7:53 am

Well this is interesting. Opened a copy of the SVG and edited out the "X" kerning information - and the same thing happened as with text -> remove manual kerning. Bizarre. This SVG file is attached. Used diff to verify that no y value had been touched:

Code: Select all

$ diff saf_test_kernissue.svg saf_test_kernissue_edited_to_unkern.svg
127c127
<      x="126.1731 131.5719 136.97064"
---
>      x="126.1731"
138c138
<      x="142.37299 147.89178 153.29053 158.68933"
---
>      x="142.37299"
153c153
<      x="169.49487 175.01367 180.41241 185.81122"
---
>      x="169.49487"
158c158
<      x="191.21527 196.73407 202.13287"
---
>      x="191.21527"
173c173
<      x="212.93488 218.33362 223.85242 229.25122 234.65002 240.04883"
---
>      x="212.93488"
182c182
<      x="142.37482 147.89362 153.29236 158.69116 164.08997 169.48877 175.00757 180.40631 185.80511 191.20392 196.72272                     202.12152 207.52026 212.91907 218.31787 223.83667 229.23547 234.63422 240.03302 245.55182 250.95062 256.34943"
---
>      x="142.37482"


Yet WHO definitely moves from one line to another, just as it did for the other method.
Attachments
saf_test_kernissue_edited_to_unkern.svg
(8.21 KiB) Downloaded 256 times


Return to “Help with using Inkscape”