How to extract data from word document?
Since few days I am trying to do modification I macro. The reason behind this is that to extract data from Microsoft Word documents. The macro which I used is as follows
I am going to modify a macro to extract data from MS Word documents. I run below mentioned macro but in the beginning where there is :
Code:
Sub GetDataFromWord()
'Set a reference (Tools - References) to the
'Microsoft Word x.0 Object Library
Dim wdApp As Word.Application
Dim wdDoc As Word.Document
Dim sFile As String
Dim rInput As Range
'Define row and column of data in table
Const lROW As Long = 2
Const lCOL As Long = 2
'Specify file that contains table
sFile = "C:\Documents and Settings\Hashemi\Desktop\Macro test\test.doc"
'instantiate Word and open document
Set wdApp = New Word.Application
Set wdDoc = wdApp.Documents.Open(sFile)
'define range where data goes
Set rInput = Sheet1.Range("a1")
'Copy value from table and paste to cell
With wdDoc.Tables(1)
.Cell(lROW, lCOL).Range.Copy
rInput.PasteSpecial xlPasteValues
End With
wdDoc.Close False
wdApp.Quit
Set wdDoc = Nothing
Set wdApp = Nothing
- ('Set a reference (Tools - References) to the
'Microsoft Word x.0 Object Library)
And also error on:
- (Dim wdApp As Word.Application)
How should I solve this?
Re: How to extract data from word document?
You can use many languages to extract data from Microsoft word, languages such as COM. Perl and Python because these languages have COM modules that you can use to extract data from MS word.
Code:
import glob,os,win32com.client
wordapp = win32com.client.Dispatch("Word.Application")
path = "D:\\mydir"
os.chdir(path)
for files in glob.glob("*.doc"):
doc = os.path.abspath(os.path.join(path, files))
print "processing " , doc
wordapp.Documents.Open(doc)
txt = doc[:-3] + 'txt'
wordapp.ActiveDocument.SaveAs
(txt,FileFormat=win32com.client.constants.wdFormatText)
wordapp.ActiveDocument.Close( )
wordapp.Quit()
Re: How to extract data from word document?
You can use the below script to extract your important data from Microsoft word document. This script is an explicit option script that extracts data.
Option Explicit
Code:
REM We use "Option Explicit" to help us check for coding mistakes
REM the Word Application
Dim objWord
REM the path to the Word file
Dim wordPath
REM the document we are currently reading data from
Dim currentDocument
REM the number of Words in the current document
Dim numberOfWords
Dim i
REM where is the Word file located?
wordPath = "C:\Data\Doc1.doc"
WScript.Echo "Extract Data from " & wordPath
REM Create an invisible version of Microsoft Word
Set objWord = CreateObject("Word.Application")
REM don't display any messages about documents needing to be converted
REM from old Word file formats
objWord.DisplayAlerts = 0
REM open the Word document as read-only
REM open (path, confirmconversions, readonly
objWord.Documents.Open wordPath, false, true
REM Access the document
Set currentDocument = objWord.Documents(1)
REM How many words are in the document
NumberOfWords = currentDocument.Words.count
WScript.Echo "There are " & NumberOfWords & " words " & vbCRLF
For i = 1 to NumberOfWords
WScript.Echo currentDocument.Words(i)
Next
REM Close the document
currentDocument.Close
REM Free memory used to store the document object
Set currentDocument = Nothing
REM exit Microsoft Word
objWord.Quit
Set objWord = Nothing
Re: How to extract data from word document?
If the above code does not help you to extract data from Microsoft word than try this code. I am sure this will help. I have personally edited for you. So you just need to copy paste the below vba code.
Code:
Sub ExtractData()
Dim sDTE As String
Dim sSubject As String
Dim strFileName As String
Dim strPath As String
Dim oDoc As Document
Dim dataDoc As Document
Dim fDialog As FileDialog
Set fDialog = Application.FileDialog(msoFileDialogFolderPicker)
'Pick the folder with the letters
With fDialog
.Title = "Select Folder containing the documents to be modifed and click OK"
.AllowMultiSelect = False
.InitialView = msoFileDialogViewList
If .Show <> -1 Then
MsgBox "Cancelled By User"
Exit Sub
End If
strPath = fDialog.SelectedItems.Item(1)
If Right(strPath, 1) <> "\" Then strPath = strPath + "\"
End With
'Close any open documents
If Documents.Count > 0 Then
Documents.Close SaveChanges:=wdPromptToSaveChanges
End If
strFileName = Dir$(strPath & "*.do?")
'Assign the name of the document to take the data
Documents.Open ("""D:\My Documents\Test\DTE data.doc""")
Set dataDoc = ActiveDocument
'Open the letters in turn
While strFileName <> ""
Set oDoc = Documents.Open(strPath & strFileName)
Selection.HomeKey wdStory 'Start from the top of the letter
With Selection.Find 'find the first string
.ClearFormatting
Do While .Execute(findText:="DTE/*^13", _
MatchWildcards:=True, _
Wrap:=wdFindStop, Forward:=True) = True
'Assign the found text to a variable and chop off
'the last character -
sDTE = Left(Selection.Range, Len(Selection.Range) - 1)
Loop
End With
Selection.HomeKey wdStory 'Start from the top of the letter
With Selection.Find 'find the second string
.ClearFormatting
Do While .Execute(findText:="Subject :*^13", _
MatchWildcards:=True, _
Wrap:=wdFindStop, Forward:=True) = True
'Assign the second string to a variable and chop off
'the last character and the leading text
sSubject = Mid(Selection.Range, 10, Len(Selection.Range) - 10)
Loop
End With
'Switch to the data document and add the content of
'the variables to the blank row of the table
dataDoc.Activate
With Selection
.EndKey wdStory
.MoveUp Unit:=wdLine, Count:=1
.MoveRight Unit:=wdCell, Count:=2 'Add a new blank row
.TypeText Text:=sDTE
.MoveRight Unit:=wdCell
.TypeText Text:=sSubject
End With
'Close the letter without saving
oDoc.Close SaveChanges:=wdDoNotSaveChanges
Set oDoc = Nothing
strFileName = Dir$()
Wend
'Save the data document
dataDoc.Save
End Sub
Re: How to extract data from word document?
If you haven’t found any necessary help from the above than you just have to copy paste as I suggest. Copy baste the below vba code in your Excel module After that you also have to select Microsoft Word Object Library from Tools.
Code:
Sub CollateForms()
Dim myPath As String
Dim myWord As New Word.Application
Dim myDoc As Word.Document
Dim myField As Word.FormField
Dim n As Long, m As Long
Dim fs, f, f1, fc
Range("A2").Select
myPath = InputBox("Path?")
Set fs = CreateObject("Scripting.FileSystemObject")
Set f = fs.GetFolder(myPath)
Set fc = f.Files
m = 0
For Each f1 In fc
n = 0
Set myDoc = myWord.Documents.Open(myPath & "\" & f1.Name)
For Each myField In myDoc.FormFields
ActiveCell.Offset(m, n).Value = myField.Result
n = n + 1
Next
myDoc.Close wdDoNotSaveChanges
m = m + 1
Next
Set myField = Nothing
Set myDoc = Nothing
Set myWord = Nothing
End Sub
Re: How to extract data from word document?
The following mentioned code is edited and made by me for you to extract data from word. This code will be beneficial for you to load the doc files in the directory into the excel sheet.
Code:
Sub LoadWordDoc()
Dim F
Dim x As Double
Dim FolderYear As String
Dim DocMonth As String
Dim DocPath As String
Dim FName()
Dim Ext As String
'///Load variables values
DocMonth = Sheet1.Range("b1")
FolderYear = Sheet1.Range("b2")
'/// To create the path to search for files
DocPath = "C:\Documents and Settings\montefem\My Documents\Excel Test\word Document\" + FolderYear + "\"
'///here i think I should use the varibe DocMonth just to list the doc with Feb in the string but I do not know how.
Ext = "*.doc"
'///To load all files from that directory and place in F as an array
ChDir (DocPath)
F = Dir(DocPath & Ext)
Application.DisplayAlerts = False
x = 2
'///Clear previous values
Sheet1.Range("g2:g100").ClearContents
'///to place the files name in the settings sheet in the xls application to manipulate them
Sheet1.Activate
With ActiveSheet
Do While Len(F) > 0
ReDim Preserve FName(2, x)
FName(2, x) = DocPath & F
Cells(x, "G") = FName(2, x)
x = x + 1
F = Dir()
Loop
If x = 1 Then MsgBox "No Files": GoTo 20
End With
20
End Sub